Benchmarking computational tools for locus-specific analysis of transposable elements in single-cell RNA-seq datasets
Finazzi, V.; Vallejos, C. A.; Scialdone, A.
Show abstract
BackgroundTransposable elements (TEs) are increasingly recognized as regulators of gene expression and cellular identity in development and disease. Single-cell RNA-sequencing (scRNA-seq) enables the analysis of their transcription at cellular resolution, but the repetitive nature of TEs and their frequent overlap with genes create substantial mapping ambiguity. Although several tools quantify TE expression, few support locus-specific analysis, and their performance in single-cell data has not been systematically evaluated. ResultsWe present a comprehensive benchmarking framework for locus-level TE quantification in short-read scRNA-seq, combining real datasets with simulations that provide read-level ground truth. TE-derived reads constitute a considerable fraction of the transcriptome and capture meaningful biological structure. Our simulations reveal that older, sequence-diverged insertions can be quantified with relatively high accuracy, whereas young TEs remain intrinsically difficult to resolve due to unreliable assignment of multi-mapping reads. We observe pronounced family-specific biases and identify gene-TE disambiguation as a major unresolved challenge. Among evaluated methods, SoloTE (unique-mapper mode) and Stellarscope (with an expectation-maximization-based reallocation of multi-mappers) showed comparable performance, while including multi-mappers generally increased false positives without substantially improving locus-level accuracy. ConclusionsOur benchmark delineates the fundamental limits imposed by short-read scRNA-seq on locus-specific TE quantification, providing practical guidance for prospective users. Suggested best practices include focusing locus-level analyses on older insertions, applying unique-mapper strategies to improve precision, aggregating counts at the subfamily level for young TEs, and explicitly checking for gene-TE overlaps. Our workflow is fully reproducible and extensible, providing a foundation for evaluating emerging methods aimed at resolving TE transcription at single-locus resolution.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.