Back

Ancestry-specific performance of variant effect predictors in clinical variant classification

Hoffing, R.; Zeiberg, D.; Stenton, S. L.; Mort, M.; Cooper, D. N.; Hahn, M. W.; O'Donnell-Luria, A.; Ward, L. D.; Radivojac, P.

2026-02-17 bioinformatics
10.64898/2026.02.14.705914 bioRxiv
Show abstract

Predicting the effects of genetic variants and assessing prediction performance are key computational tasks in genomic medicine. It has been shown that well-calibrated variant effect predictors can be reliably used as evidence towards establishing pathogenicity (or benignity) of missense variants, thereby rendering these variants suitable for use in (or exclusion from) the genetic diagnosis of rare Mendelian conditions. However, most predictors have been trained or calibrated on data that may not be sufficiently representative to lead to similar performance across all genetic ancestries. This raises questions about the responsible deployment of these tools to improve human health. To better understand the utility of computational predictors, we set out to assess their ancestry-specific performance in terms of accuracy and evidence strength according to the ACMG/AMP guidelines. First, we determined that the expected count of rare variants in an individuals genome and the allele frequency distribution of these variants are the key confounders when evaluating a predictors performance across different genetic ancestries. Second, we found that a predictors accuracy itself inversely correlates with the allele frequency of the rare variant. After stratifying according to allele frequency, we show that established methods for predicting the pathogenicity of missense variants have comparable performance levels across major ancestry groups. Our results therefore support the wide deployment of such models in the context of genetic diagnosis and related applications.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
22.1%
2
PLOS Computational Biology
1633 papers in training set
Top 4%
9.0%
3
European Journal of Human Genetics
49 papers in training set
Top 0.1%
6.2%
4
Nature Communications
4913 papers in training set
Top 30%
6.2%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.9%
4.8%
6
Human Genetics
25 papers in training set
Top 0.1%
4.2%
50% of probability mass above
7
Scientific Reports
3102 papers in training set
Top 39%
3.5%
8
Genome Biology
555 papers in training set
Top 2%
3.5%
9
Bioinformatics
1061 papers in training set
Top 6%
2.7%
10
Frontiers in Genetics
197 papers in training set
Top 3%
2.6%
11
PLOS Genetics
756 papers in training set
Top 7%
2.3%
12
Cell Genomics
162 papers in training set
Top 3%
2.0%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
14
BMC Bioinformatics
383 papers in training set
Top 5%
1.6%
15
Nucleic Acids Research
1128 papers in training set
Top 11%
1.6%
16
BioData Mining
15 papers in training set
Top 0.4%
1.5%
17
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
19
Human Genomics
21 papers in training set
Top 0.2%
1.3%
20
Genetic Epidemiology
46 papers in training set
Top 0.6%
1.1%
21
BMC Medical Genomics
36 papers in training set
Top 0.9%
0.9%
22
Advanced Science
249 papers in training set
Top 16%
0.9%
23
Human Genetics and Genomics Advances
70 papers in training set
Top 0.5%
0.9%
24
International Journal of Molecular Sciences
453 papers in training set
Top 12%
0.9%
25
PLOS ONE
4510 papers in training set
Top 65%
0.9%
26
Molecular Systems Biology
142 papers in training set
Top 1%
0.8%
27
npj Genomic Medicine
33 papers in training set
Top 1.0%
0.7%
28
Communications Biology
886 papers in training set
Top 27%
0.7%
29
Journal of Medical Genetics
28 papers in training set
Top 0.6%
0.7%
30
Science Advances
1098 papers in training set
Top 33%
0.6%