Back

Structural distance at the tRNA synthetase active site interface predicts pathogenicity but is captured by AlphaMissense and EVE except among score-ambiguous variants

Liebeskind, K.; Francklyn, C.; Barrantes Reynolds, R.

2026-05-26 bioinformatics
10.64898/2026.05.22.727252 bioRxiv
Show abstract

Variants of uncertain significance have accumulated as genomic sequencing has become more widespread, which complicates rare disease diagnosis and requires substantial resources for re-evaluation. Aminoacyl-tRNA synthetases (ARSs) are a protein family with extensive variant data and well-characterized disease associations, making them an ideal system for investigating the relationship between variant location and pathogenicity. Using structural distance measurements to the ARS-tRNA binding interface combined with existing pathogenicity predictors, AlphaMissense and EVE, we investigated whether explicit structural binding information could improve missense variant pathogenicity prediction. Pathogenic variants were found to cluster significantly closer to the tRNA-binding interface than benign variants (p = 0.0003). Incorporating explicit distance information into a Bayesian mixture model did not substantially improve predictive performance over AlphaMissense and EVE alone, suggesting that these models already implicitly capture relevant structural binding context. However, a clinically important subset of interface variants classified as ambiguous by both existing models identifies a specific gap where explicit structural distance information may provide added discriminative value, but the limited number of clinically validated variants currently available constrains the ability to fully evaluate this potential. Incorporating additional biologically relevant features not captured by existing models, such as protein stability or conformational dynamics, as well as refining structural distance calculations, may further improve classification of this subset. These findings highlight both the power and the limitations of existing pathogenicity predictors and suggest that structurally informed approaches targeting the binding interface represent a promising direction for improving classification of these ambiguous variants that have great clinical significance. Author SummaryAdvances in clinical genetic sequencing have caused increasing identification of genetic variants whose impact on human health is unknown. These "variants of uncertain significance" present a major challenge because their role in causing disease cannot yet be confirmed or ruled out. This study focuses on a specific family of essential enzymes called aminoacyl-tRNA synthetases, which play a critical role in the process of proteins translation. Mutations in these enzymes have been linked to a range of diseases. This project aims to provide a novel method for determining pathogenicity of variants specifically in aminoacyl-tRNA synthetases. We propose that physical proximity of a variant to the functional binding site of the enzyme is influential in determining pathogenicity. We find that this spatial relationship is a meaningful indicator of a variants potential to disrupt normal function.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 2%
14.9%
2
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.2%
8.5%
3
PLOS Genetics
756 papers in training set
Top 3%
4.9%
4
PLOS ONE
4510 papers in training set
Top 33%
4.4%
5
Bioinformatics
1061 papers in training set
Top 5%
4.4%
6
Frontiers in Genetics
197 papers in training set
Top 2%
3.7%
7
International Journal of Molecular Sciences
453 papers in training set
Top 2%
3.6%
8
Scientific Reports
3102 papers in training set
Top 34%
3.6%
9
Human Genetics
25 papers in training set
Top 0.1%
3.1%
50% of probability mass above
10
BMC Bioinformatics
383 papers in training set
Top 3%
2.8%
11
BioData Mining
15 papers in training set
Top 0.2%
2.4%
12
Protein Science
221 papers in training set
Top 0.6%
2.1%
13
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.3%
2.1%
14
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
2.1%
15
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
16
PeerJ
261 papers in training set
Top 7%
1.7%
17
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.7%
18
The American Journal of Human Genetics
206 papers in training set
Top 3%
1.4%
19
BMC Genomics
328 papers in training set
Top 3%
1.2%
20
Genome Biology and Evolution
280 papers in training set
Top 1%
1.2%
21
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.1%
22
Journal of Molecular Evolution
21 papers in training set
Top 0.2%
1.1%
23
Journal of Proteome Research
215 papers in training set
Top 2%
1.0%
24
ACS Omega
90 papers in training set
Top 3%
1.0%
25
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
26
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
27
Genes
126 papers in training set
Top 2%
0.8%
28
Genomics
60 papers in training set
Top 2%
0.8%
29
Biophysical Journal
545 papers in training set
Top 5%
0.8%
30
npj Genomic Medicine
33 papers in training set
Top 0.9%
0.8%