Back

Cell DiffErential Expression by Pooling (CellDEEP) highlights issues in differential gene expression in scRNA-seq

Cheng, Y.; Kettlewell, T.; Laidlaw, R. F.; Hardy, O. M.; McCluskey, A.; Otto, T. D.; Somma, D.

2026-03-11 bioinformatics
10.64898/2026.03.09.710522 bioRxiv
Show abstract

Accurate identification of differentially expressed genes (DEGs) in single-cell RNA sequencing (scRNA-seq) data remains challenging. Single-cell-specific statistical models often report large numbers of candidate genes but can exhibit inflated false positive rates, whereas pseudobulk approaches improve false discovery control at the cost of reduced sensitivity. To overcome the noise and bias that other tools have, and allow the user to have more control of the DEG process, we present CellDEEP, which uses a cell aggregation (metacell) approach. This tool provides a framework for flexible selection of pooling strategies and parameterisation for differential expression analysis (DE). Benchmarking on simulated and real datasets, including COVID-19 and rheumatoid arthritis, shows that CellDEEP often outperforms other methods, consistently reduces false positives compared to single-cell methods and recovers more true positives than pseudobulk methods. Our work shifts the focus from selecting a single "best" method to an approach that reduces cell-level noise while preserving biological signal, together with transparent validation framework, advancing more reliable differential-expression analysis in single-cell transcriptomics. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=189 HEIGHT=200 SRC="FIGDIR/small/710522v1_ufig1.gif" ALT="Figure 1"> View larger version (35K): org.highwire.dtl.DTLVardef@14692f9org.highwire.dtl.DTLVardef@5b37d6org.highwire.dtl.DTLVardef@aece11org.highwire.dtl.DTLVardef@5ade3d_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
21.6%
2
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
8.1%
3
Genome Biology
555 papers in training set
Top 0.7%
8.1%
4
Nucleic Acids Research
1128 papers in training set
Top 2%
8.1%
5
Bioinformatics Advances
184 papers in training set
Top 0.4%
6.5%
50% of probability mass above
6
Genome Research
409 papers in training set
Top 0.5%
6.1%
7
BMC Bioinformatics
383 papers in training set
Top 2%
6.1%
8
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.7%
9
BMC Genomics
328 papers in training set
Top 1%
3.4%
10
PLOS Computational Biology
1633 papers in training set
Top 13%
2.3%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
12
Cell Reports Methods
141 papers in training set
Top 2%
1.7%
13
Nature Communications
4913 papers in training set
Top 53%
1.6%
14
PLOS ONE
4510 papers in training set
Top 56%
1.6%
15
iScience
1063 papers in training set
Top 19%
1.4%
16
Nature Methods
336 papers in training set
Top 5%
1.4%
17
Nature Biotechnology
147 papers in training set
Top 5%
1.4%
18
Frontiers in Genetics
197 papers in training set
Top 7%
1.2%
19
Genome Medicine
154 papers in training set
Top 7%
0.9%
20
GigaScience
172 papers in training set
Top 3%
0.8%
21
Cell Systems
167 papers in training set
Top 13%
0.7%
22
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%