Back

Theseus: Fast and Optimal Affine-Gap Sequence-to-Graph Alignment

Jimenez-Blanco, A.; Lopez-Villellas, L.; Moure, J. C.; Moreto, M.; Marco-Sola, S.

2026-02-14 bioinformatics
10.64898/2026.02.12.705572 bioRxiv
Show abstract

MotivationSequence-to-graph alignment is a central problem in bioinformatics, with applications in multiple sequence alignment (MSA) and pangenome analysis, among others. However, current algorithms for optimal affine-gap alignment impose high memory and computational requirements, limiting their scalability to aligning long sequences to complex graphs. Practical solutions partially address this problem using heuristic strategies that ultimately trade off optimality for speed. ResultsThis work presents Theseus, a novel, fast, and optimal affine-gap sequence-to-graph alignment algorithm. Theseus leverages similarities between genomic sequences to accelerate the alignment computation and reduces the overall memory requirements without compromising optimality. To that end, Theseus exploits the diagonal transition property to process only a subset of the dynamic programming cells, combined with a sparse-data strategy that enables efficient sequence-to-graph alignment. Moreover, our algorithm supports optimal affine-gap alignment on arbitrary directed graphs, including those with cycles. We evaluate Theseus on two key problems: multiple sequence alignment (MSA) and pangenome read mapping. For MSA, we compare it against the state-of-the-art methods SPOA, abPOA, and POASTA. Theseus is 2.0x to 232.2x faster than the other two optimal aligners, SPOA and POASTA. Compared with abPOA, a heuristic aligner, Theseus is 3.3x faster on average, while ensuring optimality. For pangenome read mapping, we benchmark Theseus against the alignment stage of the popular mapping tool vg map, along with the alignment kernels of SPOA, abPOA, and POASTA. Theseus outperforms the other methods, showing a 1.9x to 16.9x speed improvement on short reads. AvailabilityTheseus code and documentation are publicly available at https://github.com/albertjimenezbl/theseus-lib. Contactalbert.jimenez.blanco@upc.es

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.