Back

Machine learning differentiates between bulk and pseudo-bulk RNA-seq datasets

Low, B. H.; Rashid, M. M.; Selvarajoo, K.

2025-07-01 bioinformatics
10.1101/2025.06.27.661895 bioRxiv
Show abstract

Modern synthetic data generators and deconvolution methods rely heavily on single-cell (sc) RNA- seq data. Aggregated scRNA-seq (pseudo-bulk) is commonly assumed to closely match true bulk RNA-seq, making it a dependable benchmark for developing and evaluating new bioinformatics methods. Here, we investigated paired bulk and scRNA-seq datasets using machine learning techniques to assess the fidelity of pseudo-bulk profiles. Our results demonstrate that pseudo-bulks differ substantially from bulk RNA-seq in both analytic metrics and biological processes.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
18.6%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.1%
3
Genome Biology
555 papers in training set
Top 0.5%
9.1%
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.3%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.8%
6
Bioinformatics
1061 papers in training set
Top 5%
4.3%
50% of probability mass above
7
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
8
Nature Communications
4913 papers in training set
Top 40%
3.6%
9
Cell Systems
167 papers in training set
Top 4%
3.6%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.7%
11
PLOS ONE
4510 papers in training set
Top 45%
2.6%
12
Scientific Reports
3102 papers in training set
Top 46%
2.6%
13
GigaScience
172 papers in training set
Top 0.8%
2.4%
14
Frontiers in Genetics
197 papers in training set
Top 4%
2.1%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
16
Genome Research
409 papers in training set
Top 2%
1.7%
17
iScience
1063 papers in training set
Top 18%
1.5%
18
BMC Genomics
328 papers in training set
Top 4%
1.2%
19
Genome Medicine
154 papers in training set
Top 6%
1.2%
20
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.2%
21
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
22
Communications Biology
886 papers in training set
Top 21%
0.8%
23
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
24
Nature Biotechnology
147 papers in training set
Top 8%
0.6%
25
Cell Genomics
162 papers in training set
Top 8%
0.6%