Back

Three Steps Novel Hard Margin Ensemble Machine Learning Method Classifies Uncertain Mefv Gene Variants

Alay, M. T.; Demir, I.; Kirisci, M.

2023-04-18 genetic and genomic medicine
10.1101/2023.04.08.23288306 medRxiv
Show abstract

IntroductionThe International Study Group for Systemic Autoinflammatory Diseases (INSAID) consensus criteria revealed that the clinical outcomes of more than half of the MEFV gene variants are uncertain. We aimed to detect more accurate classifications of MEFV variants while simultaneously reducing MEFV variant uncertainty. Material-MethodsWe extracted variants of the MEFV gene from the infevers database. We then determined the optimal number of in silico instruments for our model. On the training dataset, we implemented seven machine learning algorithms on MEFV gene variants with known clinical effects. We evaluated the effectiveness of our model in three steps: First, we performed machine-learning algorithms on the training dataset and implemented those with a prediction accuracy of greater than 90 percent. Second, we compared our gene-level and protein-level prediction results. Finally, we compared our prediction results to clinical outcomes. ResultsOur analysis included 266 of 381 MEFV gene variants and four computational tools (Revel, SIFT, MetaLR, and FATHMM). In our training dataset, the accuracy of three machine learning algorithms (RF: 100%, CRAT: 100%, and KNN: 91%) exceeded the threshold value. Thus, the dataset contained 134 likely pathogenic (LP) variants and 132 likely benign (LB) variants. We found that B30.2 domain variants were 2.5 times more likely to be LP than LB ({chi}2:12.693, p < 0.001, OR: 2.595 [1.532-4.132]. DiscussionConsidering that the clinical effects of 60% of MEFV gene variants have not yet been determined, a combined evaluation of our methods and patients clinical manifestations significantly simplifies the interpretation of unknown variants.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
BMC Medical Genomics
36 papers in training set
Top 0.1%
23.0%
2
Frontiers in Genetics
197 papers in training set
Top 0.8%
6.5%
3
PLOS ONE
4510 papers in training set
Top 33%
4.4%
4
Biomedicines
66 papers in training set
Top 0.1%
4.1%
5
Scientific Reports
3102 papers in training set
Top 30%
4.1%
6
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.7%
7
Frontiers in Immunology
586 papers in training set
Top 2%
3.7%
8
Informatics in Medicine Unlocked
21 papers in training set
Top 0.1%
3.7%
50% of probability mass above
9
Computers in Biology and Medicine
120 papers in training set
Top 1%
3.1%
10
International Journal of Molecular Sciences
453 papers in training set
Top 3%
3.1%
11
Frontiers in Pharmacology
100 papers in training set
Top 1%
2.1%
12
Human Genomics
21 papers in training set
Top 0.1%
2.1%
13
Archives of Clinical and Biomedical Research
28 papers in training set
Top 0.5%
1.9%
14
Human Mutation
29 papers in training set
Top 0.3%
1.8%
15
Journal of Clinical Medicine
91 papers in training set
Top 4%
1.4%
16
Genes
126 papers in training set
Top 2%
1.3%
17
Genomics
60 papers in training set
Top 1%
1.3%
18
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.3%
19
Clinical Immunology
21 papers in training set
Top 0.4%
1.1%
20
Journal of Translational Medicine
46 papers in training set
Top 2%
1.0%
21
Journal of Medical Virology
137 papers in training set
Top 3%
0.9%
22
European Respiratory Journal
54 papers in training set
Top 1%
0.9%
23
Human Genetics
25 papers in training set
Top 0.4%
0.8%
24
Bioinformatics
1061 papers in training set
Top 9%
0.8%
25
Journal of Clinical Immunology
11 papers in training set
Top 0.1%
0.8%
26
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.7%
27
BMC Genomics
328 papers in training set
Top 6%
0.7%
28
Annals of Translational Medicine
17 papers in training set
Top 1%
0.7%
29
Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease
25 papers in training set
Top 1%
0.7%
30
BMC Infectious Diseases
118 papers in training set
Top 6%
0.7%