Back

AniAnn's: alignment-free annotation of tandem repeat arrays using fast average nucleotide identity estimates

Sweeten, A. P.; Schatz, M.; Phillippy, A. M.

2026-01-28 bioinformatics
10.64898/2026.01.27.702063 bioRxiv
Show abstract

MotivationSatellite DNA has long posed challenges for genome assembly and analysis due to its low sequence complexity and poor mappability. These large heterochromatic arrays of tandem repeats are ubiquitous across eukaryotic genomes, yet remain understudied. Current methods for annotating satellite regions, and other classes of tandem repeat arrays, are limited in their ability to annotate divergent or novel sequences. ResultsIn this work, we introduce AniAnns, an algorithm for annotating large blocks of tandemly repeating DNAs. AniAnns exploits the high Average Nucleotide Identity (ANI) shared between repeat units of the same array to quickly and accurately infer the boundaries of such arrays. We show that AniAnns improves the annotation of satellites and other tandem repeats within a variety of plant and animal genomes, while requiring only a fraction of the runtime compared to previous approaches. We conclude by exploring several use cases of AniAnns as a lightweight method for masking repeats prior to whole-genome alignment as well as the de novo annotation and classification of satellite repeats. AvailabilityAniAnns is open source software and available at github.com/marbl/anianns

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.