Back

UshEffect-3D: Structure-informed Classification of USH2A Missense Variants for Inherited Retinal Disease

Choudhary, D.; Portelli, S.; Ascher, D. B.

2026-04-27 bioinformatics
10.64898/2026.04.23.720479 bioRxiv
Show abstract

PurposeVariants of uncertain significance (VUS) in USH2A represent a critical interpretive challenge in inherited retinal disease, with over 70% of ClinVar submissions for this gene currently unresolved. We aimed to develop a gene-specific, structure-informed machine learning framework to improve the clinical classification of USH2A missense variant and provide a tractable tool to aid the diagnosis of Usher Syndrome II. MethodsA dataset of 545 curated USH2A missense variants with established clinical classifications was assembled from ClinVar and LOVD. AlphaFold2-predicted domain structures were used to generate local structural descriptors and biochemical features combined with sequence-based evolutionary conservation scores, yielding 153 candidate features reduced to nine via sequential feature selection. Eleven machine learning classifiers were trained using a 10-fold cross-validation strategy, then independently assessed on a blind test set and validated against 78 ACMG-classified pathogenic variants. Model predictions were benchmarked against five general-purpose variant effect predictors and applied to 2639 USH2A VUS from ClinVar. Feature contributions were analysed using SHAP analysis and ablation studies. ResultsThe Random Forest classifier achieved the highest performance on the blind test set, with an MCC of 0.87 and AUC of 0.97. On independent ACMG validation, sensitivity reached 0.73 with perfect precision. UshEffect-3D substantially outperformed all general-purpose predictors, including PolyPhen-2 (MCC = 0.61), AlphaMissense (MCC = 0.42), and ESM-1b (MCC = 0.32). SHAP analysis identified evolutionary conservation as a dominant predictor, with structural stability providing an independent but complementary signal. Applied to 2639 ClinVar VUS, the model prioritised 888 variants (33.6%) as likely pathogenic, particularly enriched within the Laminin N-terminal and Laminin G-like domains. ConclusionsUshEffect-3D demonstrates that gene-specific, structure-informed machine learning substantially outperforms general-purpose variant effect predictors for USH2A missense variant interpretation. This framework provides a high-confidence prioritization resource for the large unresolved VUS burden in this gene to facilitate earlier molecular resolution of USH2A-associated disease. As genedirected therapies for USH2A-associated retinal disease advance toward clinical application, accurate and interpretable variant classification will be essential for equitable patient selection. UshEffect-3D is freely accessible via an interactive web server.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.3%
14.3%
2
Genome Medicine
154 papers in training set
Top 0.5%
10.0%
3
Human Mutation
29 papers in training set
Top 0.1%
8.4%
4
Genetics in Medicine
69 papers in training set
Top 0.2%
8.2%
5
Nature Communications
4913 papers in training set
Top 33%
4.8%
6
European Journal of Human Genetics
49 papers in training set
Top 0.2%
4.3%
7
Cell Reports Medicine
140 papers in training set
Top 0.9%
4.3%
50% of probability mass above
8
Human Genetics
25 papers in training set
Top 0.1%
3.6%
9
npj Genomic Medicine
33 papers in training set
Top 0.1%
3.6%
10
Scientific Reports
3102 papers in training set
Top 43%
2.7%
11
Bioinformatics
1061 papers in training set
Top 7%
2.1%
12
BMC Medical Genomics
36 papers in training set
Top 0.3%
2.1%
13
PLOS ONE
4510 papers in training set
Top 50%
1.9%
14
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
16
Journal of Medical Genetics
28 papers in training set
Top 0.3%
1.3%
17
Human Genomics
21 papers in training set
Top 0.2%
1.2%
18
Communications Biology
886 papers in training set
Top 16%
1.1%
19
eBioMedicine
130 papers in training set
Top 3%
0.9%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.9%
21
JCI Insight
241 papers in training set
Top 7%
0.8%
22
Clinical and Translational Science
21 papers in training set
Top 1%
0.7%
23
The Journal of Clinical Endocrinology & Metabolism
35 papers in training set
Top 1%
0.7%
24
Ophthalmology Science
20 papers in training set
Top 0.3%
0.7%
25
Trials
25 papers in training set
Top 2%
0.7%
26
BioData Mining
15 papers in training set
Top 1%
0.6%
27
PLOS Genetics
756 papers in training set
Top 17%
0.6%