Back

Knowledge-Guided Learning with Curated Prior Genetic Biomarkers for Robust Model Interpretation

Baek, B.; Jang, E.; Kim, Y.; Kang, M.

2026-01-28 bioinformatics
10.64898/2026.01.27.702122 bioRxiv
Show abstract

MotivationKnowledge-guided learning offers effective and robust model training strategies in data-scarce settings by incorporating established domain knowledge, thereby enhancing generalization, robustness, and interpretability. By contrast, conventional deep learning approaches rely purely on data-driven learning, which can limit robust model interpretability, particularly in high-dimensional settings with limited size samples. In computational biology, knowledge-guided learning has primarily leveraged network- and structural-based knowledge, leading to biologically interpretable representations and enhanced predictive performance compared to conventional approaches. However, curated biomarkers, one of the most accessible forms of biological knowledge, remain largely unexplored within knowledge-guided paradigms. ResultsIn this study, we propose a model-agnostic training paradigm, Biomarker-driven Explainable Prior-guided Learning (BioExPL), that can be applied to any neural networks that incorporates curated prior knowledge. BioExPL enforces neural networks to reflect curated biomarker priors in their latent representations through a novel knowledge-alignment loss. BioExPL consistently demonstrated significantly improved predictive performance and enhanced model interpretability with minimized computational overhead in simulation studies and intensive experiments on multiple cancer datasts. BioExPL not only integrates prior curated knowledge into the model but also accurately identifies unknown associated signals additionally. BioExPL is model-agnostic and domain-independent, enabling its integration into diverse neural network architectures. Availability and implementationThe open-source is publicly available at: https://github.com/datax-lab/BioExPL.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.6%
33.8%
2
Nature Machine Intelligence
61 papers in training set
Top 0.1%
15.1%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.5%
8.6%
50% of probability mass above
4
BMC Bioinformatics
383 papers in training set
Top 2%
4.4%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
6
Nature Communications
4913 papers in training set
Top 38%
3.7%
7
Patterns
70 papers in training set
Top 0.4%
2.7%
8
Genome Biology
555 papers in training set
Top 3%
2.5%
9
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
10
Genome Medicine
154 papers in training set
Top 4%
1.8%
11
Cell Systems
167 papers in training set
Top 8%
1.4%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.3%
13
Nucleic Acids Research
1128 papers in training set
Top 14%
1.0%
14
Scientific Reports
3102 papers in training set
Top 69%
1.0%
15
BMC Genomics
328 papers in training set
Top 4%
1.0%
16
Nature Biomedical Engineering
42 papers in training set
Top 1%
0.9%
17
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
18
Communications Biology
886 papers in training set
Top 20%
0.8%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.8%
20
Nature Methods
336 papers in training set
Top 6%
0.7%
21
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
22
New Phytologist
309 papers in training set
Top 5%
0.7%
23
Cell Reports Medicine
140 papers in training set
Top 10%
0.5%
24
Cell Genomics
162 papers in training set
Top 8%
0.5%