Back

A novel computational methodology for GWAS multi-locus analysis based on graph theory and machine learning

Saha, S.; Singh, H. N.; Soliman, A.; Rajasekaran, S.

2021-10-26 epidemiology
10.1101/2021.10.22.21265388 medRxiv
Show abstract

BackgroundCurrent form of genome-wide association studies (GWAS) is inadequate to accurately explain the genetics of complex traits due to the lack of sufficient statistical power. It explores each variant individually, but current studies show that multiple variants with varying effect sizes actually act in a concerted way to develop a complex disease. To address this issue, we have developed an algorithmic framework that can effectively solve the multi-locus problem in GWAS with a very high level of confidence. Our methodology consists of three novel algorithms based on graph theory and machine learning. It identifies a set of highly discriminating variants that are stable and robust with little (if any) spuriousness. Consequently, likely these variants should be able to interpret missing heritability of a convoluted disease as an entity. ResultsTo demonstrate the efficacy of our proposed algorithms, we have considered astigmatism case-control GWAS dataset. Astigmatism is a common eye condition that causes blurred vision because of an error in the shape of the cornea. The cause of astigmatism is not entirely known but a sizable inheritability is assumed. Clinical studies show that developmental disorders (such as, autism) and astigmatism co-occur in a statistically significant number of individuals. By performing classical GWAS analysis, we didnt find any genome-wide statistically significant variants. Conversely, we have identified a set of stable, robust, and highly predictive variants that can together explain the genetics of astigmatism. We have performed a set of biological enrichment analyses based on gene ontology (GO) terms, disease ontology (DO) terms, biological pathways, network of pathways, and so forth to manifest the accuracy and novelty of our findings. ConclusionsRigorous experimental evaluations show that our proposed methodology can solve GWAS multi-locus problem effectively and efficiently. It can identify signals from the GWAS dataset having small number of samples with a high level of accuracy. We believe that the proposed methodology based on graph theory and machine learning is the most comprehensive one compared to any other machine learning based tools in this domain.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 6%
22.9%
2
Scientific Reports
3102 papers in training set
Top 8%
9.3%
3
JMIR Formative Research
32 papers in training set
Top 0.2%
4.7%
4
BMC Research Notes
29 papers in training set
Top 0.1%
4.0%
5
Biology Methods and Protocols
53 papers in training set
Top 0.3%
3.6%
6
PeerJ
261 papers in training set
Top 3%
3.1%
7
Frontiers in Medicine
113 papers in training set
Top 2%
2.8%
50% of probability mass above
8
Journal of Medical Genetics
28 papers in training set
Top 0.2%
2.1%
9
American Journal of Medical Genetics Part A
17 papers in training set
Top 0.1%
1.9%
10
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
11
Wellcome Open Research
57 papers in training set
Top 0.8%
1.7%
12
Genetic Epidemiology
46 papers in training set
Top 0.4%
1.7%
13
Journal of Biophotonics
16 papers in training set
Top 0.4%
1.2%
14
Open Biology
95 papers in training set
Top 1%
1.2%
15
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.2%
16
JMIRx Med
31 papers in training set
Top 1%
1.2%
17
Cureus
67 papers in training set
Top 4%
1.1%
18
Biology
43 papers in training set
Top 2%
1.0%
19
PLOS Computational Biology
1633 papers in training set
Top 21%
1.0%
20
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
21
Translational Vision Science & Technology
35 papers in training set
Top 0.5%
0.9%
22
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.9%
23
F1000Research
79 papers in training set
Top 3%
0.9%
24
Database
51 papers in training set
Top 0.8%
0.8%
25
JMIR Research Protocols
18 papers in training set
Top 1%
0.8%
26
Frontiers in Neuroscience
223 papers in training set
Top 7%
0.8%
27
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.8%
28
Biomedicines
66 papers in training set
Top 3%
0.8%
29
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
30
npj Genomic Medicine
33 papers in training set
Top 0.9%
0.8%