Back

Manual versus automatic annotation of transposable elements: case studies in Drosophila melanogaster and Aedes albopictus, balancing accuracy and biological relevance

Carrasco-Valenzuela, T.; Marino, A.; Storer, J. M.; Bonnici, I.; Mazzoni, C. J.; Fontaine, M. C.; Haudry, A.; Boulesteix, M.; Fiston-Lavier, A.-S.

2025-01-12 genomics
10.1101/2025.01.10.632341 bioRxiv
Show abstract

Transposable elements (TEs) play a pivotal role in genome evolution, yet their detection and annotation remain challenging due to the limitations of current methods. Manual curation is considered the gold standard for generating TE libraries, particularly for TE focused studies, although it requires extensive training and time. With the rapid increase in genome assembly publications and the growing need for large-scale comparative analyses, automated software for TE annotation has become indispensable. This study compares manual and automated approaches to TE detection and annotation, focusing on two species: Drosophila melanogaster and Aedes albopictus. In D. melanogaster, a species with a well-annotated TE repertoire and a smaller genome, the differences between manual curation (MCTE) and automated annotation (ATTE) are relatively minor. However, significant differences arise when analysing Ae. albopictus, a species with a larger genome and higher TE diversity. While automated methods identified a greater number of TEs, including many smaller and fragmented elements, manual curation provided more detailed classifications and on average larger consensi. Automated pipelines offer a viable alternative for genome-wide analyses such as TE content estimate, particularly when time and resources are limited. However, caution is advised when interpreting results, as finer details of TE dynamics may be overlooked. This study highlights that the choice of annotation method depends on the intended analysis. Manual curation is more suitable for TE population genomics and studies focusing on recent transposable element activity, while automated methods are appropriate for larger comparative analyses or genome assembly projects. Ultimately, both methods have their strengths and limitations, and understanding the specific features of the genome and repeatome under study is essential for selecting the appropriate approach.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Mobile DNA
27 papers in training set
Top 0.1%
22.3%
2
BMC Genomics
328 papers in training set
Top 0.2%
7.1%
3
Genome Biology and Evolution
280 papers in training set
Top 0.2%
6.3%
4
Frontiers in Plant Science
240 papers in training set
Top 2%
4.8%
5
Molecular Ecology Resources
161 papers in training set
Top 0.3%
3.9%
6
Peer Community Journal
254 papers in training set
Top 0.7%
3.8%
7
Methods in Ecology and Evolution
160 papers in training set
Top 0.9%
3.6%
50% of probability mass above
8
Frontiers in Genetics
197 papers in training set
Top 2%
3.6%
9
The Plant Genome
53 papers in training set
Top 0.2%
3.6%
10
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.7%
3.6%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.0%
12
Scientific Reports
3102 papers in training set
Top 51%
2.1%
13
PLOS ONE
4510 papers in training set
Top 51%
1.9%
14
Gigabyte
60 papers in training set
Top 0.6%
1.9%
15
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
16
GigaScience
172 papers in training set
Top 2%
1.5%
17
Journal of Molecular Evolution
21 papers in training set
Top 0.2%
1.5%
18
Bioinformatics
1061 papers in training set
Top 8%
1.5%
19
BMC Biology
248 papers in training set
Top 2%
1.3%
20
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.1%
21
Genes
126 papers in training set
Top 2%
0.9%
22
Microbial Genomics
204 papers in training set
Top 2%
0.9%
23
PeerJ
261 papers in training set
Top 14%
0.8%
24
Genomics
60 papers in training set
Top 2%
0.8%
25
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.8%
26
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
27
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
28
Genome Biology
555 papers in training set
Top 7%
0.7%
29
Journal of Heredity
35 papers in training set
Top 0.2%
0.7%
30
DNA Research
23 papers in training set
Top 0.6%
0.6%