Back

Data-Driven Symbolic Higher-Order Epistasis Discovery with Kolmogorov-Arnold Networks

Patil, O. R.; Shazand, K.; Marteau, B.; Shen, Y.; Wang, M. D.

2025-11-04 genomics
10.1101/2025.10.31.685894 bioRxiv
Show abstract

Many human diseases are polygenic conditions that arise from a complex interplay of interactions between multiple genes at different loci, but currently most Genome-Wide Association Studies (GWAS) largely only consider the main additive effects of single nucleotide polymorphisms (SNPs), resulting in a missing heritability problem in some complex traits. Identifying non-additive interactions, or epistasis, at a higher-order could aid in filling this gap, but it is computationally difficult due to the massive search space involved. Current epistasis detection approaches struggle with noncartesian higher order interactions and lack inherent explainability. We present a novel deep learning (DL) approach, EPIstasis Discovery with Kolmogorov-Arnold Networks (EPIK), a data-driven, modular, and symbolically representable framework. We also introduce a novel approach for higher-order XOR (a non-Cartesian type) interaction detection, utilized in EPIKs XOR detection module. EPIK slightly outperforms other DL approaches on simulated pure epistasis interactions benchmark in average F1 score. It outperforms other, general, traditional epistasis detection approaches on simulated mixed epistasis detection datasets and real-world GWAS datasets of Arabidopsis Thaliana. Finally, EPIK recovers a known gene interaction between MAPT and WNT3 for Parkinsons Disease (PD) while also suggesting a more complex interaction between MAPT, WNT3, and another gene, KANSL1.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
18.0%
2
Bioinformatics Advances
184 papers in training set
Top 0.2%
9.8%
3
Frontiers in Genetics
197 papers in training set
Top 0.4%
8.8%
4
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
8.1%
5
PLOS ONE
4510 papers in training set
Top 33%
4.7%
6
PLOS Computational Biology
1633 papers in training set
Top 8%
4.2%
50% of probability mass above
7
PLOS Genetics
756 papers in training set
Top 5%
3.5%
8
Scientific Reports
3102 papers in training set
Top 39%
3.5%
9
iScience
1063 papers in training set
Top 6%
3.1%
10
Genome Research
409 papers in training set
Top 1%
3.0%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.5%
12
Nature Communications
4913 papers in training set
Top 46%
2.3%
13
BMC Bioinformatics
383 papers in training set
Top 4%
2.0%
14
Genetic Epidemiology
46 papers in training set
Top 0.3%
2.0%
15
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.6%
16
Communications Biology
886 papers in training set
Top 10%
1.6%
17
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
18
Patterns
70 papers in training set
Top 2%
1.2%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
20
BMC Genomics
328 papers in training set
Top 4%
0.9%
21
Cell Genomics
162 papers in training set
Top 5%
0.9%
22
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
23
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
24
Genome Biology
555 papers in training set
Top 8%
0.7%
25
BMC Medical Genomics
36 papers in training set
Top 2%
0.7%
26
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%
27
GENETICS
189 papers in training set
Top 2%
0.6%
28
Journal of Personalized Medicine
28 papers in training set
Top 2%
0.6%