Back

Pangenome-based identification of cryptic pathogenic variants in undiagnosed rare disease patients

Jang, S. S.; Kim, S.; Lee, S.; Kim, S. Y.; Moon, J.; Kim, J.; Chae, J.-H.

2025-07-11 genetic and genomic medicine
10.1101/2025.07.08.25330875 medRxiv
Show abstract

BackgroundDespite widespread implementation of exome and genome sequencing, a substantial proportion of rare disease patients remain undiagnosed due to inherent limitations in detecting structural, repetitive, and regulatory variants. MethodsWe applied long-read sequencing (LRS) to 40 individuals from 33 previously undiagnosed Korean families. De novo assemblies were integrated into a graph-based pangenome workflow, enabling sensitive detection of single-nucleotide, structural, and tandem-repeat variants and direct profiling of CpG methylation. ResultsPathogenic or likely pathogenic variants were identified in 9 (27.3%) families that had remained unsolved despite prior short-read sequencing. The discoveries comprised deep intronic splice-altering SNVs, non-coding regulatory deletions, complex rearrangements, large deletions, tandem repeat expansions, and aberrant methylation profiles. We also implicate CXXC1 as a novel disease-associated gene, potentially contributing to a global DNA methylation defects, and revealed novel pathogenic variants in established disease genes such as HEXB and NGLY1, providing insights into underrecognized genetic contributors to rare diseases. ConclusionsLRS coupled with pangenome-based, graph-driven analysis closed a sizable diagnostic gap, broadened the mutational spectra of several Mendelian genes and brought epigenomic evidence into rare disease investigation. These findings support the adoption of long-read, graph-based workflows as a front-line strategy for comprehensive genomic and epigenomic diagnosis.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
39.8%
2
Genetics in Medicine
69 papers in training set
Top 0.2%
14.5%
50% of probability mass above
3
Nature Communications
4913 papers in training set
Top 32%
4.9%
4
Human Mutation
29 papers in training set
Top 0.1%
4.6%
5
European Journal of Human Genetics
49 papers in training set
Top 0.3%
3.6%
6
Nucleic Acids Research
1128 papers in training set
Top 8%
2.4%
7
The American Journal of Human Genetics
206 papers in training set
Top 2%
2.1%
8
Human Genetics
25 papers in training set
Top 0.1%
1.9%
9
Med
38 papers in training set
Top 0.2%
1.8%
10
Scientific Reports
3102 papers in training set
Top 57%
1.7%
11
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.2%
1.5%
12
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.5%
13
BMC Medical Genomics
36 papers in training set
Top 0.7%
1.2%
14
npj Genomic Medicine
33 papers in training set
Top 0.5%
1.2%
15
Human Genomics
21 papers in training set
Top 0.2%
1.1%
16
International Journal of Epidemiology
74 papers in training set
Top 2%
0.9%
17
Alzheimer's & Dementia
143 papers in training set
Top 2%
0.9%
18
Human Molecular Genetics
130 papers in training set
Top 3%
0.8%
19
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
20
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
21
Genetics in Medicine Open
10 papers in training set
Top 0.1%
0.7%
22
BMC Genomics
328 papers in training set
Top 7%
0.6%
23
Journal of Translational Medicine
46 papers in training set
Top 4%
0.5%