Knowledge-Guided Learning with Curated Prior Genetic Biomarkers for Robust Model Interpretation
Baek, B.; Jang, E.; Kim, Y.; Kang, M.
Show abstract
MotivationKnowledge-guided learning offers effective and robust model training strategies in data-scarce settings by incorporating established domain knowledge, thereby enhancing generalization, robustness, and interpretability. By contrast, conventional deep learning approaches rely purely on data-driven learning, which can limit robust model interpretability, particularly in high-dimensional settings with limited size samples. In computational biology, knowledge-guided learning has primarily leveraged network- and structural-based knowledge, leading to biologically interpretable representations and enhanced predictive performance compared to conventional approaches. However, curated biomarkers, one of the most accessible forms of biological knowledge, remain largely unexplored within knowledge-guided paradigms. ResultsIn this study, we propose a model-agnostic training paradigm, Biomarker-driven Explainable Prior-guided Learning (BioExPL), that can be applied to any neural networks that incorporates curated prior knowledge. BioExPL enforces neural networks to reflect curated biomarker priors in their latent representations through a novel knowledge-alignment loss. BioExPL consistently demonstrated significantly improved predictive performance and enhanced model interpretability with minimized computational overhead in simulation studies and intensive experiments on multiple cancer datasts. BioExPL not only integrates prior curated knowledge into the model but also accurately identifies unknown associated signals additionally. BioExPL is model-agnostic and domain-independent, enabling its integration into diverse neural network architectures. Availability and implementationThe open-source is publicly available at: https://github.com/datax-lab/BioExPL.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.