Accelerating long-read analysis on modern CPUs
Kalikar, S.; Jain, C.; Md, V.; Misra, S.
Show abstract
Long read sequencing is now routinely used at scale for genomics and transcriptomics applications. Mapping of long reads or a draft genome assembly to a reference sequence is often one of the most time consuming steps in these applications. Here, we present techniques to accelerate minimap2, a widely used software for mapping. We present multiple optimizations using SIMD parallelization, efficient cache utilization and a learned index data structure to accelerate its three main computational modules, i.e., seeding, chaining and pairwise sequence alignment. These result in reduction of end-to-end mapping time of minimap2 by up to 1.8 x while maintaining identical output.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.