Back

Accelerating long-read analysis on modern CPUs

Kalikar, S.; Jain, C.; Md, V.; Misra, S.

2021-07-23 genomics
10.1101/2021.07.21.453294 bioRxiv
Show abstract

Long read sequencing is now routinely used at scale for genomics and transcriptomics applications. Mapping of long reads or a draft genome assembly to a reference sequence is often one of the most time consuming steps in these applications. Here, we present techniques to accelerate minimap2, a widely used software for mapping. We present multiple optimizations using SIMD parallelization, efficient cache utilization and a learned index data structure to accelerate its three main computational modules, i.e., seeding, chaining and pairwise sequence alignment. These result in reduction of end-to-end mapping time of minimap2 by up to 1.8 x while maintaining identical output.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Research
409 papers in training set
Top 0.1%
22.2%
2
Bioinformatics
1061 papers in training set
Top 1%
22.2%
3
Genome Biology
555 papers in training set
Top 0.9%
6.7%
50% of probability mass above
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.3%
5
Nature Communications
4913 papers in training set
Top 33%
4.8%
6
GigaScience
172 papers in training set
Top 0.6%
3.5%
7
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.9%
3.0%
8
Nature Methods
336 papers in training set
Top 3%
2.6%
9
Bioinformatics Advances
184 papers in training set
Top 2%
2.3%
10
iScience
1063 papers in training set
Top 9%
2.3%
11
Nature Biotechnology
147 papers in training set
Top 4%
2.1%
12
PLOS ONE
4510 papers in training set
Top 52%
1.8%
13
Scientific Reports
3102 papers in training set
Top 59%
1.7%
14
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.5%
15
Communications Biology
886 papers in training set
Top 11%
1.5%
16
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.3%
17
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
18
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
19
BMC Genomics
328 papers in training set
Top 4%
1.2%
20
Journal of Open Source Software
22 papers in training set
Top 0.2%
0.9%
21
Nature Computational Science
50 papers in training set
Top 1%
0.9%
22
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.7%
23
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%