Back

Benchmarking computational tools for locus-specific analysis of transposable elements in single-cell RNA-seq datasets

Finazzi, V.; Vallejos, C. A.; Scialdone, A.

2026-02-28 bioinformatics
10.64898/2026.02.26.708244 bioRxiv
Show abstract

BackgroundTransposable elements (TEs) are increasingly recognized as regulators of gene expression and cellular identity in development and disease. Single-cell RNA-sequencing (scRNA-seq) enables the analysis of their transcription at cellular resolution, but the repetitive nature of TEs and their frequent overlap with genes create substantial mapping ambiguity. Although several tools quantify TE expression, few support locus-specific analysis, and their performance in single-cell data has not been systematically evaluated. ResultsWe present a comprehensive benchmarking framework for locus-level TE quantification in short-read scRNA-seq, combining real datasets with simulations that provide read-level ground truth. TE-derived reads constitute a considerable fraction of the transcriptome and capture meaningful biological structure. Our simulations reveal that older, sequence-diverged insertions can be quantified with relatively high accuracy, whereas young TEs remain intrinsically difficult to resolve due to unreliable assignment of multi-mapping reads. We observe pronounced family-specific biases and identify gene-TE disambiguation as a major unresolved challenge. Among evaluated methods, SoloTE (unique-mapper mode) and Stellarscope (with an expectation-maximization-based reallocation of multi-mappers) showed comparable performance, while including multi-mappers generally increased false positives without substantially improving locus-level accuracy. ConclusionsOur benchmark delineates the fundamental limits imposed by short-read scRNA-seq on locus-specific TE quantification, providing practical guidance for prospective users. Suggested best practices include focusing locus-level analyses on older insertions, applying unique-mapper strategies to improve precision, aggregating counts at the subfamily level for young TEs, and explicitly checking for gene-TE overlaps. Our workflow is fully reproducible and extensible, providing a foundation for evaluating emerging methods aimed at resolving TE transcription at single-locus resolution.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
18.1%
2
Mobile DNA
27 papers in training set
Top 0.1%
14.3%
3
Genome Biology
555 papers in training set
Top 0.7%
8.2%
4
Nucleic Acids Research
1128 papers in training set
Top 3%
6.1%
5
Bioinformatics Advances
184 papers in training set
Top 0.8%
4.7%
50% of probability mass above
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.5%
4.2%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.2%
8
GigaScience
172 papers in training set
Top 0.4%
3.9%
9
BMC Bioinformatics
383 papers in training set
Top 3%
3.5%
10
PLOS Computational Biology
1633 papers in training set
Top 11%
3.0%
11
PLOS ONE
4510 papers in training set
Top 51%
1.8%
12
Nature Communications
4913 papers in training set
Top 49%
1.8%
13
Scientific Reports
3102 papers in training set
Top 55%
1.8%
14
BMC Genomics
328 papers in training set
Top 2%
1.7%
15
Microbial Genomics
204 papers in training set
Top 1%
1.6%
16
Genome Medicine
154 papers in training set
Top 6%
1.3%
17
Molecular Ecology Resources
161 papers in training set
Top 0.8%
1.2%
18
BMC Biology
248 papers in training set
Top 2%
1.2%
19
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
20
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.7%
21
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
22
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
23
Genome Research
409 papers in training set
Top 5%
0.7%
24
Gigabyte
60 papers in training set
Top 2%
0.7%
25
Plant Communications
35 papers in training set
Top 2%
0.7%
26
Cell Genomics
162 papers in training set
Top 7%
0.7%
27
Plant Physiology
217 papers in training set
Top 3%
0.6%