Back

Genome-wide detection and clinical prioritization of tandem repeat outliers using long-read sequencing

Gibson, S. B.; Damaraju, N.; Gustafson, J. G.; Balton, E. V.; Chanprasert, S.; Glass, I. A.; Horike-Pyne, M.; Kumar, R. D.; Leppig, K. A.; Lundberg, C.; Ranchalis, J.; Rosenthal, E. A.; Solomon, A. K.; Stergachis, A. B.; Wener, M.; UDN, ; Jarvik, G. P.; Blue, E. E.; Dipple, K. M.; Dashnow, H.; Starita, L. M.; Miller, D. E.

2026-05-01 genetic and genomic medicine
10.64898/2026.04.30.26352103 medRxiv
Show abstract

BackgroundTandem repeat expansions (TREs) cause over 60 known neurological, neuromuscular, and developmental disorders. Detecting these expansions genome-wide is challenging due to their size, sequence complexity (including interruptions), and population variation. While long-read sequencing is an emerging technology that can fully resolve many TREs, no methods have been described for genome-wide identification and prioritization of candidate pathogenic TREs with this technology. MethodsUsing a newly developed pipeline called TRoLR (Tandem Repeat outliers identified with Long Reads), we analyzed haplotype-resolved long-read genome assemblies from 471 ancestrally diverse individuals to define population distributions for over three million tandem repeat loci, capturing clinically relevant interruptions. Outlier expansions were identified relative to these distributions and prioritized by genomic location and comparison to known pathogenic loci. The framework was applied to 47 cases from the Undiagnosed Diseases Network. ResultsPopulation stratification of repeat metrics was observed at 7% of loci, with highest variability among individuals of African ancestry. Outlier analysis confirmed known pathogenic CNBP and ATXN8OS expansions, detected carrier-range alleles at RFC1, CSTB, and FXN, and revealed a novel CGG expansion in the 5 UTR of PCMTD2 exhibiting hypermethylation and intergenerational instability. Genome-wide screening also identified intronic pentanucleotide expansions at IQCB1 and MAP3K15 in controls composed of motifs that have been associated with pathogenicity at other disease loci. ConclusionsQuantifying the longest uninterrupted repeat segment in long-read assemblies enables detection of clinically relevant repeat expansions and loss of stabilizing interruptions. This approach enhances both diagnostic confirmation and discovery of candidate pathogenic expansions, with implications for clinical interpretation and research into complex repeat-mediated disorders.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genetics in Medicine
69 papers in training set
Top 0.1%
27.8%
2
Genome Medicine
154 papers in training set
Top 0.3%
12.4%
3
The American Journal of Human Genetics
206 papers in training set
Top 0.4%
10.1%
50% of probability mass above
4
Annals of Clinical and Translational Neurology
29 papers in training set
Top 0.1%
6.3%
5
Human Mutation
29 papers in training set
Top 0.1%
4.3%
6
npj Genomic Medicine
33 papers in training set
Top 0.1%
3.6%
7
Nature Communications
4913 papers in training set
Top 46%
2.4%
8
Genetics in Medicine Open
10 papers in training set
Top 0.1%
2.4%
9
Scientific Reports
3102 papers in training set
Top 53%
1.9%
10
Neurology Genetics
14 papers in training set
Top 0.1%
1.8%
11
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.2%
1.7%
12
Human Molecular Genetics
130 papers in training set
Top 2%
1.7%
13
Annals of Neurology
57 papers in training set
Top 1%
1.5%
14
Alzheimer's & Dementia
143 papers in training set
Top 2%
1.1%
15
Brain
154 papers in training set
Top 4%
1.0%
16
European Journal of Human Genetics
49 papers in training set
Top 0.9%
1.0%
17
Movement Disorders
62 papers in training set
Top 1.0%
0.7%
18
Human Genetics and Genomics Advances
70 papers in training set
Top 0.8%
0.7%
19
Cell Genomics
162 papers in training set
Top 7%
0.7%
20
Med
38 papers in training set
Top 1%
0.6%
21
PLOS ONE
4510 papers in training set
Top 71%
0.6%
22
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.9%
0.5%
23
BMC Genomics
328 papers in training set
Top 8%
0.5%
24
New England Journal of Medicine
50 papers in training set
Top 1.0%
0.5%
25
BMC Medical Genomics
36 papers in training set
Top 2%
0.5%
26
Neurobiology of Disease
134 papers in training set
Top 5%
0.5%