Back
Machine learning differentiates between bulk and pseudo-bulk RNA-seq datasets
Low, B. H.; Rashid, M. M.; Selvarajoo, K.
2025-07-01
bioinformatics
10.1101/2025.06.27.661895
bioRxiv
Show abstract
Modern synthetic data generators and deconvolution methods rely heavily on single-cell (sc) RNA- seq data. Aggregated scRNA-seq (pseudo-bulk) is commonly assumed to closely match true bulk RNA-seq, making it a dependable benchmark for developing and evaluating new bioinformatics methods. Here, we investigated paired bulk and scRNA-seq datasets using machine learning techniques to assess the fidelity of pseudo-bulk profiles. Our results demonstrate that pseudo-bulks differ substantially from bulk RNA-seq in both analytic metrics and biological processes.
Matching journals
●Non-profit
◐University press
○Commercial
The top 6 journals account for 50% of the predicted probability mass.
1
NAR Genomics and Bioinformatics
◐
214 papers in training set
Top 0.1%
18.6%
2
PLOS Computational Biology
●
1633 papers in training set
Top 3%
10.1%
3
Genome Biology
○
555 papers in training set
Top 0.5%
9.1%
4
BMC Bioinformatics
○
383 papers in training set
Top 2%
6.3%
5
Briefings in Bioinformatics
◐
326 papers in training set
Top 1%
4.8%
6
Bioinformatics
◐
1061 papers in training set
Top 5%
4.3%
50% of probability mass above
7
Nucleic Acids Research
◐
1128 papers in training set
Top 6%
3.6%
8
Nature Communications
○
4913 papers in training set
Top 40%
3.6%
9
Cell Systems
○
167 papers in training set
Top 4%
3.6%
10
Bioinformatics Advances
◐
184 papers in training set
Top 2%
2.7%
11
PLOS ONE
●
4510 papers in training set
Top 45%
2.6%
12
Scientific Reports
○
3102 papers in training set
Top 46%
2.6%
13
GigaScience
◐
172 papers in training set
Top 0.8%
2.4%
14
Frontiers in Genetics
○
197 papers in training set
Top 4%
2.1%
15
Computational and Structural Biotechnology Journal
●
216 papers in training set
Top 3%
2.1%
16
Genome Research
●
409 papers in training set
Top 2%
1.7%
17
iScience
○
1063 papers in training set
Top 18%
1.5%
18
BMC Genomics
○
328 papers in training set
Top 4%
1.2%
19
Genome Medicine
○
154 papers in training set
Top 6%
1.2%
20
Genomics, Proteomics & Bioinformatics
◐
171 papers in training set
Top 4%
1.2%
21
Nature Machine Intelligence
○
61 papers in training set
Top 3%
0.9%
22
Communications Biology
○
886 papers in training set
Top 21%
0.8%
23
Frontiers in Bioinformatics
○
45 papers in training set
Top 0.9%
0.7%
24
Nature Biotechnology
○
147 papers in training set
Top 8%
0.6%
25
Cell Genomics
○
162 papers in training set
Top 8%
0.6%