Back

Building computational benchmarks: an Omnibenchmark reimplementation of a single-cell preprocessing pipeline evaluation

Choudhury, A.; Kitak, T.; Carrillo, B.; Busch, P.; Emons, M.; Gunz, S.; Koderman, M.; Luo, S.; Mallona, I.; Meara, A.; Wissel, D.; Robinson, M. D.

2026-05-05 bioinformatics
10.64898/2026.05.01.722166 bioRxiv
Show abstract

In the past few years, we have seen a veritable surge in single-cell (e.g., RNA sequencing) techniques and datasets, enabling increasingly detailed characterization of cellular heterogeneity across tissues and conditions. This surge in single-cell techniques has been complemented by a large number of analysis frameworks and pipelines, and a large parameter space and researcher degrees of freedom to use them. Many neutral benchmarks have been presented for various computational tasks, but most make design decisions that render them incompatible with each other, e.g., different datasets and metrics, or parameter sets used. In this work, we showcase a recently developed framework, Omnibenchmark, to build reproducible, extensible and standardized method comparisons. This not only facilitates the broad investigation of pipelines used in single-cell data analysis, but also highlights how the process of building benchmarks can be streamlined and unified. We do this as an initial proof-of-principle for an arms-length benchmark that evaluates five single-cell RNA sequencing pipelines (filtering to normalization to dimensionality reduction to clustering) on three datasets. This standardization enables benchmarks to be easily extended in several directions, including broader parameter sweeps, comparisons across software versions and architectures, isolation of pipeline steps, and integration of additional pipelines, datasets, and metrics.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
14.3%
2
BMC Bioinformatics
383 papers in training set
Top 1.0%
9.8%
3
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
8.2%
4
Nature Methods
336 papers in training set
Top 1%
8.2%
5
Nature Communications
4913 papers in training set
Top 31%
6.2%
6
GigaScience
172 papers in training set
Top 0.2%
6.2%
50% of probability mass above
7
Bioinformatics
1061 papers in training set
Top 5%
4.7%
8
Nucleic Acids Research
1128 papers in training set
Top 4%
4.7%
9
Genome Research
409 papers in training set
Top 0.8%
4.1%
10
Nature Biotechnology
147 papers in training set
Top 2%
3.9%
11
PLOS Computational Biology
1633 papers in training set
Top 9%
3.9%
12
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
13
BMC Genomics
328 papers in training set
Top 2%
1.8%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
15
Cell Systems
167 papers in training set
Top 8%
1.6%
16
PLOS ONE
4510 papers in training set
Top 59%
1.3%
17
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.3%
18
Genome Medicine
154 papers in training set
Top 6%
1.1%
19
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
20
Nature Protocols
30 papers in training set
Top 0.2%
0.8%
21
Scientific Reports
3102 papers in training set
Top 77%
0.7%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
23
Communications Biology
886 papers in training set
Top 30%
0.6%
24
Life Science Alliance
263 papers in training set
Top 3%
0.6%