Back

Systematic identification of disease-associated 3D neighborhoods in protein structures

Finucane, H. K.; Nason, E.; Gerges, S.; Satterstrom, F. K.; Gorissen, B.; Liao, R.; Panagiotaropoulou, G.; Guez, J.; The Autism Sequencing Consortium, ; Karczewski, K.; Daly, M. J.

2026-06-02 genetic and genomic medicine
10.64898/2026.05.29.26354366 medRxiv
Show abstract

Rare variant association studies (RVAS) have identified hundreds of genes contributing to human disease, yet gene-level signals provide limited insight into the molecular mechanisms underlying pathogenicity. Missense variants, which can be mapped onto three-dimensional protein structures, offer an opportunity to gain novel mechanistic insights. Here, we develop a scalable framework for systematically mapping case and control variants onto protein structures and identifying spatially localized regions enriched for case variants. Our framework builds on the 3D Neighborhood Test (3DNT), which we recently introduced in a single-gene analysis of ATP2B2, and enables the genome-wide analysis of rare coding variation beyond standard gene-level approaches. We applied 3DNT across multiple large-scale datasets, including Mendelian disease variants from ClinVar, de novo mutations from 37,486 autism spectrum disorder (ASD) probands, and case-control exome sequencing cohorts for epilepsy and schizophrenia. We identified significant clusters in 872 genes for Mendelian disease, in 70 genes for autism, in one gene for epilepsy, and in three genes for schizophrenia. These clusters are strongly enriched for known functional sites and provide insight into both known and previously unrecognized disease genes. Our results demonstrate that scalably integrating RVAS data with protein structure predictions localizes disease-associated variation to specific functional regions and reveals a layer of disease biology that is largely invisible to standard analyses.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.2%
19.3%
2
Nature
575 papers in training set
Top 3%
10.0%
3
Nature Communications
4913 papers in training set
Top 25%
7.1%
4
Cell Genomics
162 papers in training set
Top 0.4%
6.8%
5
Genome Medicine
154 papers in training set
Top 1%
6.3%
6
Cell Systems
167 papers in training set
Top 2%
6.3%
50% of probability mass above
7
Genome Biology
555 papers in training set
Top 2%
4.8%
8
Nature Neuroscience
216 papers in training set
Top 2%
4.3%
9
The American Journal of Human Genetics
206 papers in training set
Top 1%
4.3%
10
Science
429 papers in training set
Top 8%
4.1%
11
Cell
370 papers in training set
Top 8%
2.7%
12
Nature Medicine
117 papers in training set
Top 2%
2.1%
13
Nucleic Acids Research
1128 papers in training set
Top 10%
1.9%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.9%
15
Nature Biotechnology
147 papers in training set
Top 5%
1.7%
16
Scientific Reports
3102 papers in training set
Top 67%
1.1%
17
Nature Human Behaviour
85 papers in training set
Top 4%
0.9%
18
Bioinformatics
1061 papers in training set
Top 9%
0.8%
19
Neuron
282 papers in training set
Top 9%
0.7%
20
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
21
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
22
PLOS Computational Biology
1633 papers in training set
Top 26%
0.7%
23
Nature Methods
336 papers in training set
Top 7%
0.6%