Back

Mutation Pathogenicity Prediction by a Biology Based Explainable AI Multi-Modal Algorithm

Kellerman, R.; Nayshool, O.; Barel, O.; Paz, S.; Amariglio, N.; Klang, E.; Rechavi, G.

2024-06-05 genetic and genomic medicine
10.1101/2024.06.05.24308476
Show abstract

Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Deciphering the protein structure therefore provides great insight into the molecular mechanisms underlying biological functions in human disease. While there have recently been major advances in the artificial intelligence-based prediction of protein structure, the determination of the biological and clinical relevance of specific mutations is not yet up to clinical standards. This challenge is of utmost medical importance when decisions, as critical as suggesting termination of pregnancy or recommending cancer-directed rational drugs, depend on the accuracy of prediction of the effect of the specific mutation. Currently, available tools are aiming to characterize the effect of a mutation on the functionality of the protein according to biochemical criteria, independent of the biological context. A specific change in protein structure can result either in loss of function (LOF) or gain-of-function (GOF) and the ability to identify the directionality of effect needs to be taken into consideration when interpreting the biological outcome of the mutation. Here we describe Triple-modalities Variant Interpretation and Analysis (TriVIAI), a tool incorporating three complementing modalities for improved prediction of missense mutations pathogenicity: protein language model (pLM), graph neural network (GNN) and a tabular model incorporating physical properties from the protein structure. The TriVIAl ensembles predictions compare favorably with the existing tools across various metrics, achieving an AUC-ROC of 0.887, a precision-recall curve (PRC) score of 0.68, and a Brier score of 0.16. The TriVIAI ensemble is also endowed with two major advantages compared to other available tools. The first is the incorporation of biological insights which allow to differentiate between GOF mutations that tend to cluster in specific hotspots and affect structure in a specific functional way versus LOF mutations that are usually dispersed and can cripple the protein in a variety of different ways. Importantly, the advantage over other available tools is more noticeable with GOF mutations as their effect on the protein structure is less disruptive and can be misinterpreted by current variant prioritization strategies. Until now available AI-based pathogenicity predicting algorithms were a black box for the users. The second significant advantage of TriVIAI is the explainability of the ensemble which contrasts the other available AI-based pathogenicity predicting algorithms which constitute a black box for the users. This explainability feature is of major importance considering the clinical responsibility of the medical decision-makers using AI-based pathogenicity predictors.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Scientific Reports
based on 701 papers
Top 2%
18.0%
2
Genome Medicine
based on 56 papers
Top 0.2%
13.4%
3
PLOS ONE
based on 1737 papers
Top 61%
6.8%
4
Frontiers in Genetics
based on 32 papers
Top 0.4%
5.0%
5
Genetics in Medicine
based on 57 papers
Top 2%
4.8%
6
Human Genomics
based on 13 papers
Top 0.1%
3.0%
50% of probability mass above
7
Human Mutation
based on 14 papers
Top 0.4%
3.0%
8
European Journal of Human Genetics
based on 25 papers
Top 0.7%
2.6%
9
Computers in Biology and Medicine
based on 39 papers
Top 2%
2.6%
10
Briefings in Bioinformatics
based on 11 papers
Top 0.1%
2.5%
11
BMC Genomics
based on 15 papers
Top 0.1%
2.4%
12
Journal of Biomedical Informatics
based on 37 papers
Top 3%
1.9%
13
Bioinformatics
based on 24 papers
Top 0.7%
1.9%
14
Journal of Clinical Medicine
based on 77 papers
Top 9%
1.7%
15
Biology
based on 11 papers
Top 0.4%
1.4%
16
PLOS Genetics
based on 39 papers
Top 3%
1.4%
17
Computational and Structural Biotechnology Journal
based on 14 papers
Top 1.0%
1.4%
18
International Journal of Molecular Sciences
based on 39 papers
Top 2%
1.4%
19
Nature Communications
based on 483 papers
Top 36%
1.3%
20
The American Journal of Human Genetics
based on 77 papers
Top 6%
1.3%
21
Journal of Medical Genetics
based on 22 papers
Top 2%
0.8%
22
eLife
based on 262 papers
Top 27%
0.8%
23
npj Genomic Medicine
based on 18 papers
Top 2%
0.8%
24
Human Genetics
based on 14 papers
Top 2%
0.7%
25
iScience
based on 74 papers
Top 8%
0.7%