Back

MAJEC: unified gene, isoform, and locus-level transposable element quantification from RNA-seq

Lim, T.-Y.; Firestone, A. J.

2026-04-14 bioinformatics
10.64898/2026.04.10.717472 bioRxiv
Show abstract

BackgroundThe study of transposable elements (TEs) has become increasingly central to fields such as cancer biology, immunology, and aging. Accurately quantifying disease- or laboratory-mediated perturbations in these elements is critical to support this expanding research, yet current RNA-seq pipelines struggle with the pervasive overlap between TEs and protein-coding genes. Existing tools either aggregate to the subfamily level with no locus resolution (TEtranscripts), or provide locus-level quantification without modeling gene overlap (Telescope), with the latter attributing over 40% of TE signal to the 1.1% of loci that overlap gene exons. ResultsWe present MAJEC (Momentum Accelerated Junction Enhanced Counting), a unified Expectation-Maximization (EM) framework that jointly quantifies genes, transcript isoforms, and individual TE loci from BAM alignments in a single pass. Splice junction evidence informs transcript-level priors, enabling MAJEC to probabilistically distinguish genic from TE-derived reads. This approach was independently validated against Salmon and RSEM on isoform quantification benchmarks. The joint feature space reduces exon-overlap contamination of locus-level TE estimates from 43% of total signal (Telescope) to 5% (MAJEC), while preserving subfamily-level accuracy (differential expression r = 0.987 vs TEtranscripts). Using paired biological vignettes, we demonstrate that MAJEC correctly resolves both the false TE reactivation artifacts endemic to TE-only models, and the false gene upregulation artifacts that occur when heuristic rules misassign genuine intragenic TE transcription. ConclusionMAJEC simultaneously produces the isoform and locus-level resolution that TEtranscripts lacks, with greater accuracy than Telescope, and runs faster than either.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
21.9%
2
Bioinformatics Advances
184 papers in training set
Top 0.2%
8.9%
3
Nature Biotechnology
147 papers in training set
Top 0.8%
8.9%
4
Genome Biology
555 papers in training set
Top 0.8%
7.0%
5
Nature Communications
4913 papers in training set
Top 30%
6.2%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 3%
6.1%
7
BMC Bioinformatics
383 papers in training set
Top 3%
3.5%
8
Nature Methods
336 papers in training set
Top 3%
3.0%
9
Cell Genomics
162 papers in training set
Top 2%
3.0%
10
Mobile DNA
27 papers in training set
Top 0.1%
2.7%
11
Genome Research
409 papers in training set
Top 1%
2.5%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
13
Cell Systems
167 papers in training set
Top 7%
1.6%
14
PLOS Computational Biology
1633 papers in training set
Top 17%
1.6%
15
Nature
575 papers in training set
Top 11%
1.6%
16
PLOS ONE
4510 papers in training set
Top 59%
1.3%
17
Science
429 papers in training set
Top 16%
1.3%
18
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
19
The American Journal of Human Genetics
206 papers in training set
Top 3%
1.1%
20
GigaScience
172 papers in training set
Top 3%
0.9%
21
BMC Genomics
328 papers in training set
Top 5%
0.8%
22
Genome Medicine
154 papers in training set
Top 8%
0.7%
23
Nature Genetics
240 papers in training set
Top 8%
0.7%
24
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.7%
25
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
26
Scientific Reports
3102 papers in training set
Top 79%
0.6%