Back

Systematic Benchmarking of Kinase Bioactivity Models Across Splitting Strategies and Protein Representations

Abbott, J. M.

2026-04-22 bioinformatics
10.64898/2026.04.20.719590 bioRxiv
Show abstract

Machine learning models for protein-ligand bioactivity prediction are increasingly used in computational drug discovery. However, reported benchmark performance is often sensitive to evaluation design. To further understand evaluation design strategies, we present a systematic evaluation of seven machine learning architectures for kinase inhibitor bioactivity prediction, spanning classical baselines (Random Forest, XGBoost, ElasticNet, multi-layer perceptron) and advanced neural approaches (Graph Isomorphism Network, ESM-2 protein embedding MLP, and a GNN-ESM fusion model). Using a curated ChEMBL-derived kinase activity dataset of 352,874 records across 507 human protein kinase targets, we evaluated all models under three splitting strategies of increasing stringency: random, scaffold-based (Bemis-Murcko), and target-held-out. We observed that Random Forest with Morgan fingerprints achieves near-equivalent or superior performance to all neural architectures under scaffold and target-based evaluation. On target-held-out splits frozen ESM-2 embeddings showed worse generalization, with ESM-FP MLP exhibiting the largest performance degradation. Learned graph representations (GIN) do not outperform fixed 2048-bit ECFP4 fingerprints at this data scale, and tree-based uncertainty methods outperform MC-Dropout implementations tested here on calibration and selective prediction metrics. A JAK kinase subfamily case study shows that protein-aware models achieved 79% top-1 selectivity accuracy versus 52% for pooled fingerprint models. However, stronger baselines using explicit target identity achieved 83-84%, indicating that ESM-2 embeddings in this study functioned primarily as an implicit target identifier. These results indicate that evaluation methodology and statistical rigor are major determinants of reported performance in bioactivity prediction. Benchmark design overview O_FIG O_LINKSMALLFIG WIDTH=177 HEIGHT=200 SRC="FIGDIR/small/719590v1_ufig1.gif" ALT="Figure 1"> View larger version (50K): org.highwire.dtl.DTLVardef@ccbae4org.highwire.dtl.DTLVardef@1020583org.highwire.dtl.DTLVardef@1b7ef76org.highwire.dtl.DTLVardef@ca685a_HPS_FORMAT_FIGEXP M_FIG C_FIG A curated ChEMBL kinase bioactivity dataset (352,874 records, 507 targets) was evaluated under three splitting strategies of increasing stringency. Seven model architectures spanning baselines, protein-aware, and graph neural approaches were each trained under 5-seed replication (105 total runs), with results analyzed across three complementary branches: the main 507-target benchmark, ESM-2 embedding ablation studies on a clean 92-target subset, and a JAK-family selectivity case study with stronger target-conditioned baselines

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.2%
22.6%
2
Journal of Cheminformatics
25 papers in training set
Top 0.1%
18.7%
3
Bioinformatics
1061 papers in training set
Top 4%
6.3%
4
Bioinformatics Advances
184 papers in training set
Top 0.7%
4.9%
50% of probability mass above
5
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
6
Scientific Reports
3102 papers in training set
Top 36%
3.6%
7
BMC Bioinformatics
383 papers in training set
Top 3%
3.6%
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
9
PLOS ONE
4510 papers in training set
Top 45%
2.6%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 29%
1.9%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.8%
12
Nature Communications
4913 papers in training set
Top 50%
1.8%
13
Cell Systems
167 papers in training set
Top 7%
1.7%
14
Patterns
70 papers in training set
Top 0.9%
1.7%
15
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.5%
16
International Journal of Molecular Sciences
453 papers in training set
Top 9%
1.5%
17
Scientific Data
174 papers in training set
Top 2%
0.8%
18
Nature Machine Intelligence
61 papers in training set
Top 3%
0.7%
19
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
20
Metabolites
50 papers in training set
Top 1%
0.6%
21
Molecules
37 papers in training set
Top 2%
0.6%