Back

LongAllele: a joint inference framework for allele-specific analysis on long-read bulk and single-cell RNA sequencing

Xu, Z.; Wang, K.

2026-05-08 bioinformatics
10.64898/2026.05.05.722992 bioRxiv
Show abstract

Allele-specific analysis from RNA-seq is a powerful approach to characterize cis-regulatory effects. However, existing methods remain limited in both haplotype inference and allelic testing. Their haplotype-inference workflows separate variant calling, haplotype phasing, and read-haplotype assignment into sequential steps, failing to fully exploit within-read SNV linkage information and propagating errors into downstream allelic analysis. At the testing stage, they ignore non-phasable reads lacking heterozygous SNVs, biasing calls and inflating false positives, and remain incomplete across gene-, isoform-, and local-event-level variant effects. Here, we present LongAllele, a statistical framework that employs an expectation-maximization algorithm to jointly infer heterozygous variants, haplotype structure, and read-haplotype assignments from long-read bulk and single-cell RNA sequencing. LongAllele further introduces phasability-aware testing that explicitly accounts for non-phasable reads, avoiding inflated false-positive calls when haplotype information is incomplete. It also enables comprehensive allelic testing across gene-level ASE, isoform-level allele-specific transcript usage (ASTU), and local-event-level haplotype-associated exon and junction usage (HAEU and HAJU), providing a multi-scale view of cis-regulation. We applied LongAllele to long-read RNA-seq datasets spanning GTEx (multi-tissue bulk), peripheral blood mononuclear cells (single-cell), and human hippocampus (single-nucleus). LongAllele consistently revealed greater tissue and cell-type variability in expression-level than isoform-level allelic regulation, pinpointed high-impact regulatory variants including rare splice-site mutations missed by standalone variant callers, and showed that purifying selection constrains allelic imbalance at both gene and isoform levels. LongAllele offers a unified framework for haplotype-resolved cis-regulatory analysis across diverse cellular contexts.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
17.1%
2
Nature Communications
4913 papers in training set
Top 20%
9.9%
3
Bioinformatics
1061 papers in training set
Top 3%
8.0%
4
Nature Methods
336 papers in training set
Top 2%
7.0%
5
Genome Research
409 papers in training set
Top 0.3%
6.7%
6
Cell Systems
167 papers in training set
Top 2%
6.7%
50% of probability mass above
7
Nucleic Acids Research
1128 papers in training set
Top 4%
4.7%
8
The American Journal of Human Genetics
206 papers in training set
Top 1.0%
4.7%
9
Nature Biotechnology
147 papers in training set
Top 2%
4.5%
10
Genome Medicine
154 papers in training set
Top 2%
3.9%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.8%
3.5%
12
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.0%
13
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
15
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
16
Cell Genomics
162 papers in training set
Top 4%
1.3%
17
PLOS ONE
4510 papers in training set
Top 60%
1.2%
18
Nature
575 papers in training set
Top 14%
1.1%
19
Nature Genetics
240 papers in training set
Top 6%
0.9%
20
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
21
BMC Genomics
328 papers in training set
Top 5%
0.9%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
23
Science
429 papers in training set
Top 20%
0.7%
24
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
25
Scientific Reports
3102 papers in training set
Top 77%
0.7%
26
Advanced Science
249 papers in training set
Top 22%
0.6%
27
PLOS Genetics
756 papers in training set
Top 17%
0.6%