Back

Human Proteome-wide Mechanistic Interpretation of Missense Variants through Protein Feature Enrichment Score

Kwon, S.; Safer, J.; DiStefano, M.; Lebo, M.; Rehm, H. L.; Iqbal, S.

2026-05-28 bioinformatics
10.64898/2026.05.26.726248 bioRxiv
Show abstract

Missense variant interpretation remains a central challenge in clinical and medical genetics, with most observed variants being variants of uncertain significance (VUS). Computational variant effect predictors can achieve high pathogenicity classification performance, but without revealing the underlying mechanism and a translatable interpretation. Here we present the Protein Feature Enrichment Score (PFES), which quantifies the molecular context of missense variants through statistical enrichment of 103 protein structural, functional, and physicochemical features across 85,321 pathogenic and 130,719 control variants spanning 20 protein functional classes. We show that the protein feature (PF) enrichment patterns of variants are conserved within functional classes and vary substantially across classes, both in magnitude and directions depending on functional context. PFES not only partitions variants into PF-Enriched (pathogenic-like), PF-Neutral, and PF-Depleted (benign-like) categories but also provides a mechanistic interpretation by decomposing the score into subscores from biologically interpretable protein feature attributes. We demonstrate that PFES shows a high concordance with VUS reclassification and prioritization: across 596 genes, pathogenicity-leaning VUS-high variants were seven-fold enriched in PF-Enriched variants. PFES decomposition further revealed that loss-of-function and gain-of-function variants are distinguished by disproportionate enrichment of protein-protein interaction features in the latter. We computed PFES across 223 million possible missense variants (17.7% PF-Enriched) and built a publicly available resource that addresses not just whether a variant is pathogenic, but which protein characteristics are disrupted. Proteome-wide application across 20,153 genes prioritizes established rare disease genes and nominates therapeutically amenable targets whose pathogenic variation is driven by interpretable structural and functional protein feature disruption. One Sentence SummaryPFES is a proteome-wide resource to quantify the protein context of missense variants, enabling mechanistically transparent variant interpretation.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.4%
18.3%
2
Cell Genomics
162 papers in training set
Top 0.2%
9.9%
3
Nature Communications
4913 papers in training set
Top 23%
8.3%
4
Genome Medicine
154 papers in training set
Top 1%
6.3%
5
Nature Methods
336 papers in training set
Top 2%
6.2%
6
The American Journal of Human Genetics
206 papers in training set
Top 0.8%
6.2%
50% of probability mass above
7
Genome Biology
555 papers in training set
Top 2%
4.8%
8
Science
429 papers in training set
Top 8%
3.9%
9
Nature Biotechnology
147 papers in training set
Top 3%
3.5%
10
Nature Machine Intelligence
61 papers in training set
Top 1%
3.2%
11
Advanced Science
249 papers in training set
Top 7%
2.7%
12
Nucleic Acids Research
1128 papers in training set
Top 8%
2.6%
13
Nature Genetics
240 papers in training set
Top 3%
2.6%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 29%
2.0%
15
Nature
575 papers in training set
Top 10%
1.9%
16
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
17
Cell
370 papers in training set
Top 15%
0.9%
18
Science Advances
1098 papers in training set
Top 29%
0.8%
19
Bioinformatics
1061 papers in training set
Top 9%
0.8%
20
Genome Research
409 papers in training set
Top 4%
0.7%
21
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
22
Cell Reports Medicine
140 papers in training set
Top 8%
0.7%
23
Nature Medicine
117 papers in training set
Top 6%
0.7%