Genome-wide detection and clinical prioritization of tandem repeat outliers using long-read sequencing
Gibson, S. B.; Damaraju, N.; Gustafson, J. G.; Balton, E. V.; Chanprasert, S.; Glass, I. A.; Horike-Pyne, M.; Kumar, R. D.; Leppig, K. A.; Lundberg, C.; Ranchalis, J.; Rosenthal, E. A.; Solomon, A. K.; Stergachis, A. B.; Wener, M.; UDN, ; Jarvik, G. P.; Blue, E. E.; Dipple, K. M.; Dashnow, H.; Starita, L. M.; Miller, D. E.
Show abstract
BackgroundTandem repeat expansions (TREs) cause over 60 known neurological, neuromuscular, and developmental disorders. Detecting these expansions genome-wide is challenging due to their size, sequence complexity (including interruptions), and population variation. While long-read sequencing is an emerging technology that can fully resolve many TREs, no methods have been described for genome-wide identification and prioritization of candidate pathogenic TREs with this technology. MethodsUsing a newly developed pipeline called TRoLR (Tandem Repeat outliers identified with Long Reads), we analyzed haplotype-resolved long-read genome assemblies from 471 ancestrally diverse individuals to define population distributions for over three million tandem repeat loci, capturing clinically relevant interruptions. Outlier expansions were identified relative to these distributions and prioritized by genomic location and comparison to known pathogenic loci. The framework was applied to 47 cases from the Undiagnosed Diseases Network. ResultsPopulation stratification of repeat metrics was observed at 7% of loci, with highest variability among individuals of African ancestry. Outlier analysis confirmed known pathogenic CNBP and ATXN8OS expansions, detected carrier-range alleles at RFC1, CSTB, and FXN, and revealed a novel CGG expansion in the 5 UTR of PCMTD2 exhibiting hypermethylation and intergenerational instability. Genome-wide screening also identified intronic pentanucleotide expansions at IQCB1 and MAP3K15 in controls composed of motifs that have been associated with pathogenicity at other disease loci. ConclusionsQuantifying the longest uninterrupted repeat segment in long-read assemblies enables detection of clinically relevant repeat expansions and loss of stabilizing interruptions. This approach enhances both diagnostic confirmation and discovery of candidate pathogenic expansions, with implications for clinical interpretation and research into complex repeat-mediated disorders.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.