Data-Driven Symbolic Higher-Order Epistasis Discovery with Kolmogorov-Arnold Networks
Patil, O. R.; Shazand, K.; Marteau, B.; Shen, Y.; Wang, M. D.
Show abstract
Many human diseases are polygenic conditions that arise from a complex interplay of interactions between multiple genes at different loci, but currently most Genome-Wide Association Studies (GWAS) largely only consider the main additive effects of single nucleotide polymorphisms (SNPs), resulting in a missing heritability problem in some complex traits. Identifying non-additive interactions, or epistasis, at a higher-order could aid in filling this gap, but it is computationally difficult due to the massive search space involved. Current epistasis detection approaches struggle with noncartesian higher order interactions and lack inherent explainability. We present a novel deep learning (DL) approach, EPIstasis Discovery with Kolmogorov-Arnold Networks (EPIK), a data-driven, modular, and symbolically representable framework. We also introduce a novel approach for higher-order XOR (a non-Cartesian type) interaction detection, utilized in EPIKs XOR detection module. EPIK slightly outperforms other DL approaches on simulated pure epistasis interactions benchmark in average F1 score. It outperforms other, general, traditional epistasis detection approaches on simulated mixed epistasis detection datasets and real-world GWAS datasets of Arabidopsis Thaliana. Finally, EPIK recovers a known gene interaction between MAPT and WNT3 for Parkinsons Disease (PD) while also suggesting a more complex interaction between MAPT, WNT3, and another gene, KANSL1.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.