Back

ATHILAfinder: a tool to detect ATHILA LTR retrotransposons in plant genomes

Bousios, A.; Primetis, E.

2026-03-22 bioinformatics
10.64898/2026.03.20.713144 bioRxiv
Show abstract

MotivationThe ATHILA lineage of LTR retrotransposons has colonised all branches of the plant tree of life. In Arabidopsis thaliana and A. lyrata, ATHILA elements have invaded centromeres, influencing the genetic and epigenetic organisation, and driving satellite evolution. To assess the broader significance of ATHILA across plants, a computational pipeline is needed to identify ATHILA elements with high efficiency. Existing tools lack this ability because they are optimised for broad transposon classification at the expense of precise annotation of lower taxonomic levels. ResultsWe present ATHILAfinder, a pipeline for accurate and large-scale discovery of ATHILA elements. ATHILAfinder uses lineage-specific sequence motifs as seeds and additional filters to build de novo intact elements. Homology-based steps rescue intact ATHILA and identify soloLTRs. A detailed identity card includes coordinates, LTR identity, coding capacity, length and other sequence features for every ATHILA. We validate ATHILAfinder in the A. thaliana Col-CEN assembly and five additional Brassicaceae species, covering four supertribes and [~]30 million years of evolution. ATHILAfinder has very low false positive rates and outperforms widely-used tools like EDTA and the deep-learning-based Inpactor2 software for both recovery and precision of ATHILA. To demonstrate its usefulness, we generate insights into ATHILA dynamics across Brassicaceae. OutlookFew computational pipelines target specific transposon lineages, yet such tools can empower their identification and downstream analyses. Our tailored approach can be adapted to other LTR retrotransposon lineages, offering new ways for high-resolution analysis of transposons.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Mobile DNA
27 papers in training set
Top 0.1%
22.8%
2
Bioinformatics
1061 papers in training set
Top 1%
22.8%
3
Bioinformatics Advances
184 papers in training set
Top 0.2%
9.2%
50% of probability mass above
4
BMC Bioinformatics
383 papers in training set
Top 3%
3.6%
5
Methods in Ecology and Evolution
160 papers in training set
Top 1.0%
2.9%
6
Nucleic Acids Research
1128 papers in training set
Top 7%
2.6%
7
PLOS ONE
4510 papers in training set
Top 48%
2.1%
8
GigaScience
172 papers in training set
Top 0.9%
2.1%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
1.9%
10
Frontiers in Plant Science
240 papers in training set
Top 3%
1.7%
11
BMC Genomics
328 papers in training set
Top 2%
1.7%
12
Scientific Reports
3102 papers in training set
Top 62%
1.5%
13
Genome Biology
555 papers in training set
Top 5%
1.3%
14
PLOS Computational Biology
1633 papers in training set
Top 18%
1.3%
15
Microbial Genomics
204 papers in training set
Top 1%
1.3%
16
Frontiers in Genetics
197 papers in training set
Top 7%
1.2%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
18
Nature Communications
4913 papers in training set
Top 59%
0.9%
19
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
20
Peer Community Journal
254 papers in training set
Top 3%
0.8%
21
Plant Communications
35 papers in training set
Top 1%
0.8%
22
New Phytologist
309 papers in training set
Top 5%
0.8%
23
Plant Physiology
217 papers in training set
Top 3%
0.7%
24
PLOS Genetics
756 papers in training set
Top 17%
0.7%
25
Molecular Plant
36 papers in training set
Top 2%
0.5%
26
Horticulture Research
43 papers in training set
Top 2%
0.5%