Movi 2: Fast and Space-Efficient Queries on Pangenomes
Zakeri, M.; Brown, N. K.; Gagie, T.; Langmead, B.
Show abstract
Space-efficient compressed indexing methods are critical for pangenomics and for avoiding reference bias. In the Movi study, we implemented the move-structure index, highlighting its locality-of-reference and speed. However, Movi had a high memory footprint compared to other compressed indexes. Here we introduce Movi 2 and describe new methods that greatly reduce size and memory footprint of move structure-based indexes. The most compressed version of Movi 2 reduces the Movi indexs space footprint more than fivefold. We also introduce sampling approaches that enable trade-offs between query and space efficiency. To demonstrate, we show that Movi 2 achieves advantageous time and space tradeoffs when applied to large pangenome collections, including both the first and second releases of the Human Pangenome Reference Consortium (HPRC) collection, the latter of which spans over 460 human haplotyes. We show that Movi 2 dominates prior methods on both speed and memory footprint, including both r-index-based and our previous move-structure-based method. The methods we developed for Movi 2 are publicly available at https://github.com/mohsenzakeri/Movi.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.