Back

CardioSafe: Multi-task prediction of cardiac ion channel activity with reverse-leak audited benchmarking

Jovanovic, M.; Weidener, L. S.; Brkic, M.; Ulgac, E.; Meduri, A.

2026-05-12 bioinformatics
10.64898/2026.05.06.723181 bioRxiv
Show abstract

Drug-induced inhibition of the hERG potassium channel is the leading cause of cardiac safety-related drug attrition, but the Comprehensive in Vitro Proarrhythmia Assay (CiPA) framework requires activity data on multiple cardiac ion channels to assess proarrhythmic risk. We present CardioSafe, a three-branch multi-task neural network with cross-attention fusion that integrates chemical fingerprints, ChemBERTa embeddings, and predicted L1000 transcriptomic features to predict blocker status and potency for hERG, Nav1.5, and Cav1.2, with an exploratory IKs head. CardioSafe was trained on the largest publicly reported multi-channel cardiac ion channel dataset, combining ChEMBL 36 with the hERGCentral database (331127 hERG, 3160 Nav1.5, 1138 Cav1.2, and 115 IKs compounds), curated under a pharmacology-aware policy that retains censored measurements and inhibition-percentage votes. Under Tanimoto-similarity-controlled splits, CardioSafe outperforms the leading published comparators (CToxPred2 and CardioGenAI) on the data-rich hERG head; on the smaller Nav1.5 and Cav1.2 heads the standard evaluation is statistically inconclusive. A reverse-leak audit revealed that 22% of Nav1.5 and 21% of Cav1.2 test compounds were present in published comparators training data (92% as exact compound matches); after removing these contaminated compounds, CardioSafes lead on Nav1.5 and Cav1.2 also reaches statistical significance, demonstrating that prior cross-publication benchmarks for these channels were inflated by training-data overlap. Scientific contributionWe present the first multi-task neural network jointly predicting blocker activity for the three primary CiPA cardiac ion channels (hERG, Nav1.5, Cav1.2) within a single architecture. We introduce a reverse-leak audit methodology that reveals systematic test-set contamination in cross-publication cardiac safety benchmarks, establishing a stricter evaluation protocol. We provide the empirical test of predicted L1000 transcriptomic features as auxiliary input for cardiac ion channel prediction and document a well-characterized negative result. Graphical abstractCardioSafe encodes each query SMILES with three branches (chemical fingerprints + descriptors, pretrained ChemBERTa, and predicted L1000 transcriptomic signatures), fuses them via a cross-attention block with four learnable per-channel query tokens, and emits binary blocker calls plus pChEMBL regression for hERG, Nav1.5, Cav1.2, and (exploratory) IKs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=59 SRC="FIGDIR/small/723181v1_ufig1.gif" ALT="Figure 1"> View larger version (13K): org.highwire.dtl.DTLVardef@1c0ba2aorg.highwire.dtl.DTLVardef@1fe3a0borg.highwire.dtl.DTLVardef@194de8aorg.highwire.dtl.DTLVardef@9e4f74_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 11%
14.2%
2
Advanced Science
249 papers in training set
Top 0.8%
12.4%
3
Nature Machine Intelligence
61 papers in training set
Top 0.2%
10.0%
4
Patterns
70 papers in training set
Top 0.1%
10.0%
5
Bioinformatics
1061 papers in training set
Top 3%
9.1%
50% of probability mass above
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.6%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.2%
8
iScience
1063 papers in training set
Top 7%
2.7%
9
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
2.1%
10
Nature Biomedical Engineering
42 papers in training set
Top 0.7%
1.9%
11
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.7%
12
Chemical Science
71 papers in training set
Top 1.0%
1.7%
13
npj Digital Medicine
97 papers in training set
Top 2%
1.5%
14
Nature
575 papers in training set
Top 13%
1.3%
15
Cell Genomics
162 papers in training set
Top 4%
1.3%
16
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
17
Cell Systems
167 papers in training set
Top 9%
1.2%
18
Nature Methods
336 papers in training set
Top 5%
1.2%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
20
Communications Biology
886 papers in training set
Top 15%
1.2%
21
Communications Chemistry
39 papers in training set
Top 1.0%
0.8%
22
Genome Medicine
154 papers in training set
Top 8%
0.7%
23
Cell Reports Medicine
140 papers in training set
Top 8%
0.7%
24
GigaScience
172 papers in training set
Top 3%
0.7%
25
Scientific Reports
3102 papers in training set
Top 76%
0.7%
26
Nature Chemical Biology
104 papers in training set
Top 4%
0.6%