Recursive Repeat Extender (RRE): A recursive approach to automatically extend repeat element models
Falcon, F.; Tanaka, E. M.; Rodriguez-Terrones, D.
Show abstract
Repetitive elements, including transposable elements (TEs), are integral structural components of eukaryotic genomes; consequently, their identification and classification are crucial to their study. Several approaches have been developed to perform de novo genome-wide repeat identification through pairwise sequence comparisons; however, they often generate truncated repeat models due to their sampling strategies and the substantial fragmentation of many of the older repeat copies in the genome. To improve repeat models generated de novo, several algorithms have been developed that increase model length via the BEEA (BLAST-Extend-Extract-Align) approach, in which genomic instances of each repeat are identified with BLAST, their coordinates are extended, and a refined model is generated by aligning the extended sequences. Nevertheless, these extension algorithms exhibit two key limitations that hinder the reconstruction of highly degenerate and fragmented repeats: the use of BLAST as a search algorithm - which limits their sensitivity in detecting highly diverged sequences - and the use of a single search step, which precludes the reconstruction of extensively fragmented repeat models. In this work, we present a novel approach to extend repeat models, called RRE (Recursive Repeat Extender), which uses profile hidden Markov models (HMMs) to search for repeat elements with high sensitivity and employs a recursive extension strategy that iteratively searches and extends the repeat model, using the extended model from each round as input for the next and continuing until no additional sequence can be incorporated. We apply RRE to repeat libraries generated de novo from five model organisms, and our results show that RRE-generated repeat libraries contain fewer but longer repeat models and can identify a larger proportion of the genomes as repetitive than RepeatModeler2-generated repeat libraries. Notably, RRE can reconstruct highly degenerate repeats such as CR1_Mam, producing a model that achieves similar coverage to the reference Dfam model while extending it by an additional 131 bp that were not captured in the reference model. Overall, RRE enables the automatic improvement of de novo repeat libraries and the reconstruction of highly degenerate and fragmented repeats.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.