Back

A like-for-like comparison of lightweight-mapping pipelines for single-cell RNA-seq data pre-processing

Zakeri, M.; Srivastava, A.; Sarkar, H.; Patro, R.

2021-02-11 bioinformatics
10.1101/2021.02.10.430656 bioRxiv
Show abstract

Recently, Booeshaghi and Pachter (1) published a benchmark comparing the kallisto-bustools pipeline (2) for single-cell data pre-processing to the alevin-fry pipeline (3). Their benchmarking adopted drastically dissimilar configurations for these two tools, and overlooked the time- and space-frugal configurations of alevin-fry previously benchmarked by Sarkar et al. (3). In this manuscript, we provide a small set of modifications to the benchmarking scripts of Booeshaghi and Pachter that are necessary to perform a like-for-like comparison between kallisto-bustools and alevin-fry. We also address some misuses of the alevin-fry commands and include important data on the exact reference transcriptomes used for processing1. Using the same benchmarking scripts of Booeshaghi and Pachter (1), we demonstrate that, when configured to match the computational com-plexity of kallisto-bustools as closely as possible, alevin-fry processes data faster (~2.08 times as fast on average) and uses less peak memory (~ 0.34 times as much on average) compared to kallisto-bustools, while producing results that are similar when assessed in the manner done by Booeshaghi and Pachter (1). This is a notable inversion of the performance characteristics presented in the previous benchmark.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
26.5%
2
Bioinformatics
1061 papers in training set
Top 3%
10.3%
3
Nucleic Acids Research
1128 papers in training set
Top 2%
7.4%
4
Bioinformatics Advances
184 papers in training set
Top 0.3%
7.0%
50% of probability mass above
5
BMC Bioinformatics
383 papers in training set
Top 1%
6.5%
6
GigaScience
172 papers in training set
Top 0.2%
5.0%
7
Database
51 papers in training set
Top 0.2%
3.7%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.7%
9
PeerJ
261 papers in training set
Top 4%
2.4%
10
BMC Genomics
328 papers in training set
Top 2%
2.1%
11
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
1.7%
12
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
1.7%
13
PLOS ONE
4510 papers in training set
Top 60%
1.3%
14
Genome Research
409 papers in training set
Top 3%
1.3%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.0%
16
F1000Research
79 papers in training set
Top 3%
0.9%
17
BMC Research Notes
29 papers in training set
Top 0.3%
0.9%
18
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.7%
0.9%
19
iScience
1063 papers in training set
Top 26%
0.9%
20
RNA
169 papers in training set
Top 0.4%
0.8%
21
Scientific Reports
3102 papers in training set
Top 72%
0.8%
22
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
23
International Journal of Molecular Sciences
453 papers in training set
Top 17%
0.7%
24
Nature Methods
336 papers in training set
Top 6%
0.7%
25
Genome Biology
555 papers in training set
Top 9%
0.5%
26
Communications Biology
886 papers in training set
Top 31%
0.5%
27
Biophysical Journal
545 papers in training set
Top 6%
0.5%