Back

Dissecting the relationship between haplotypes around ATXN2 CAG repeats and the number of CAA interruptions by long-read sequencing

Lee, B. H.; Chan, J.; McMillan, C.; NYGC ALS Consortium, ; Song, Y.; Amado, D. A.; Wang, K.

2026-03-12 genetic and genomic medicine
10.64898/2026.03.11.26348169 medRxiv
Show abstract

CAG repeat expansions in ATXN2 are implicated as risk factors for neurological diseases, including amyotrophic lateral sclerosis (ALS) when 27-33 CAG (intermediate) repeats are present. However, how haplotypes around the repeats and CAA interruptions within the repeats are associated with diseases remains poorly understood. Here, we used long-read sequencing on the Oxford Nanopore technologies (ONT) platform to simultaneously infer haplotypes around ATXN2, the number of CAG repeats, and the number of CAA interruptions. We found that haplotypes around ATXN2 and the number of interruptions show ethnicity-specific and ALS-specific distribution. Three CAA interruptions are present at low prevalence ([~]1%) in control populations in multiple ancestry groups, but high prevalence ([~]55%) in ALS individuals with intermediate repeats. Furthermore, we examined 159 individuals with ALS ([~]90% European ancestry) with intermediate ATXN2 repeats and found a unique haplotype in ALS individuals with three CAA interruptions, which can be tagged by an SNV, rs148019457. We further sequenced 41 individuals (EUR = 39) with neurological diseases with intermediate repeats by ONT, and validated that the rs148019457-G allele is only present in haplotypes with three CAA interruptions. Our study shows that 3 CAA interruptions are rare in healthy controls but are common in individuals with intermediate ATXN2 CAG repeats and neurological disorders, and that rs148019457 tags a specific haplotype with 3 CAA interruptions in individuals of European ancestry. These results have implications for the development of precision genomic medicine for neurological disorders, and the tag SNV may help identify those with interruptions from existing microarray genotyping data.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Brain
154 papers in training set
Top 0.5%
10.1%
2
Nature Communications
4913 papers in training set
Top 21%
9.1%
3
Genome Medicine
154 papers in training set
Top 0.6%
8.4%
4
Cell Genomics
162 papers in training set
Top 0.4%
6.8%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.7%
6.8%
6
Human Molecular Genetics
130 papers in training set
Top 0.8%
3.6%
7
Human Genetics and Genomics Advances
70 papers in training set
Top 0.1%
3.6%
8
Neuron
282 papers in training set
Top 4%
3.6%
50% of probability mass above
9
Cell
370 papers in training set
Top 6%
3.6%
10
Frontiers in Genetics
197 papers in training set
Top 3%
3.1%
11
Genome Biology
555 papers in training set
Top 3%
2.4%
12
Scientific Reports
3102 papers in training set
Top 48%
2.4%
13
Nature Genetics
240 papers in training set
Top 3%
2.1%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 28%
2.1%
15
NAR Molecular Medicine
18 papers in training set
Top 0.1%
1.9%
16
Communications Biology
886 papers in training set
Top 9%
1.7%
17
eLife
5422 papers in training set
Top 43%
1.7%
18
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
19
BMC Genomics
328 papers in training set
Top 3%
1.5%
20
npj Genomic Medicine
33 papers in training set
Top 0.5%
1.3%
21
Neurobiology of Disease
134 papers in training set
Top 3%
1.2%
22
Alzheimer's & Dementia
143 papers in training set
Top 2%
1.2%
23
Med
38 papers in training set
Top 0.5%
0.9%
24
iScience
1063 papers in training set
Top 32%
0.7%
25
EMBO Molecular Medicine
85 papers in training set
Top 5%
0.7%
26
Biosensors and Bioelectronics
52 papers in training set
Top 2%
0.7%
27
Science Advances
1098 papers in training set
Top 31%
0.7%
28
Human Genetics
25 papers in training set
Top 0.5%
0.6%
29
Annals of Neurology
57 papers in training set
Top 2%
0.6%
30
Aging Cell
144 papers in training set
Top 4%
0.6%