Back

Disagreement among variant effect predictors guides experimental prioritization of target proteins

Jonsson, N. F.; Marsh, J. A.; Lindorff-Larsen, K.

2026-03-20 bioinformatics
10.64898/2026.03.18.712765 bioRxiv
Show abstract

Interpreting the functional consequences of genetic variation, especially rare missense variants, remains a significant challenge in human genetics. Computational variant effect predictors (VEPs) and multiplexed assays of variant effects (MAVEs) provide complementary approaches, with VEPs offering scalable predictions and MAVEs delivering detailed empirical measurements. However, MAVEs are resource intensive and cannot yet be applied broadly across the proteome, making it important to identify proteins where experimental mapping will be most informative. We hypothesised that MAVEs should be particularly valuable for proteins where computational predictors disagree, as such disagreement may highlight mechanistic blind spots. To test this, we analysed predictions from ten distinct VEPs across more than 13,000 human proteins and quantified inter-predictor concordance. We observed substantial variability across proteins in the degree of agreement across predictors and investigated structural, functional and gene-level features associated with this variation. We find that inter-VEP concordance showed no relationship with agreement to experimental MAVE data. If predictor agreement reflected how intrinsically predictable a protein is, these quantities would be expected to correlate. Their decoupling instead suggests that MAVEs may provide orthogonal information to VEPs, supporting the use of inter-VEP disagreement to prioritise proteins where experimental data will be most informative. We therefore propose using inter-VEP disagreement as a practical strategy to prioritise proteins for experimental characterization. Focusing on proteins with low predictor concordance should maximise the informational value of new MAVEs, and improve variant interpretation in both research and clinical contexts.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Cell Genomics
162 papers in training set
Top 0.2%
10.4%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.0%
3
Molecular Systems Biology
142 papers in training set
Top 0.1%
9.1%
4
The American Journal of Human Genetics
206 papers in training set
Top 0.6%
8.4%
5
Genome Biology
555 papers in training set
Top 0.6%
8.4%
6
Cell Systems
167 papers in training set
Top 3%
4.8%
50% of probability mass above
7
eLife
5422 papers in training set
Top 22%
3.9%
8
Genome Medicine
154 papers in training set
Top 2%
3.6%
9
Nature Communications
4913 papers in training set
Top 40%
3.6%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.7%
3.6%
12
Bioinformatics
1061 papers in training set
Top 6%
2.1%
13
Scientific Reports
3102 papers in training set
Top 50%
2.1%
14
Journal of Molecular Biology
217 papers in training set
Top 2%
1.7%
15
Protein Science
221 papers in training set
Top 0.9%
1.7%
16
Human Genetics and Genomics Advances
70 papers in training set
Top 0.4%
1.5%
17
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
18
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
20
Structure
175 papers in training set
Top 3%
0.9%
21
PLOS ONE
4510 papers in training set
Top 63%
0.9%
22
Journal of Proteome Research
215 papers in training set
Top 2%
0.7%
23
iScience
1063 papers in training set
Top 33%
0.7%
24
Human Genetics
25 papers in training set
Top 0.5%
0.6%
25
Nature Methods
336 papers in training set
Top 7%
0.6%