Back

Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores

LeMaster, C.; Schwendinger-Schreck, C.; Ge, B.; Cheung, W.; Johnston, J.; Pastinen, T.; Smail, C.

2024-03-18 genetic and genomic medicine
10.1101/2024.03.15.24304216 medRxiv
Show abstract

Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 22,019 deletions, 2,041 duplications, 87,826 insertions, and 214 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 1x10-03). This difference was not observed in the lowest-ranked gene set (P = 0.15). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
npj Genomic Medicine
33 papers in training set
Top 0.1%
14.5%
2
Cell Genomics
162 papers in training set
Top 0.1%
14.2%
3
The American Journal of Human Genetics
206 papers in training set
Top 0.8%
6.3%
4
Translational Psychiatry
219 papers in training set
Top 1%
4.2%
5
Genetics in Medicine
69 papers in training set
Top 0.4%
4.2%
6
Genetic Epidemiology
46 papers in training set
Top 0.2%
4.1%
7
Genome Medicine
154 papers in training set
Top 2%
3.5%
50% of probability mass above
8
Human Mutation
29 papers in training set
Top 0.2%
2.8%
9
Scientific Reports
3102 papers in training set
Top 46%
2.6%
10
Human Genetics and Genomics Advances
70 papers in training set
Top 0.2%
2.6%
11
Human Genetics
25 papers in training set
Top 0.1%
2.0%
12
Molecular Autism
29 papers in training set
Top 0.2%
2.0%
13
Journal of Medical Genetics
28 papers in training set
Top 0.3%
1.9%
14
Autism Research
32 papers in training set
Top 0.3%
1.9%
15
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.8%
16
PLOS Genetics
756 papers in training set
Top 9%
1.7%
17
Biological Psychiatry
119 papers in training set
Top 2%
1.7%
18
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
19
American Journal of Medical Genetics Part B: Neuropsychiatric Genetics
22 papers in training set
Top 0.2%
1.7%
20
Genome Research
409 papers in training set
Top 3%
1.5%
21
European Journal of Human Genetics
49 papers in training set
Top 0.8%
1.3%
22
American Journal of Medical Genetics Part A
17 papers in training set
Top 0.2%
1.3%
23
eLife
5422 papers in training set
Top 49%
1.2%
24
Nature Communications
4913 papers in training set
Top 57%
1.1%
25
BMC Medical Genomics
36 papers in training set
Top 1%
0.9%
26
American Journal of Psychiatry
20 papers in training set
Top 0.5%
0.7%
27
JAMA Pediatrics
10 papers in training set
Top 0.2%
0.7%
28
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
29
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
30
Nature Neuroscience
216 papers in training set
Top 6%
0.7%