Back

BioMADE: Predicting Torsades de Pointes from molecular structures through biologically informed representations

Acitores Cortina, J. M.; Schut, M. C.; Tatonetti, N. P.

2026-05-11 bioinformatics
10.64898/2026.05.06.723121 bioRxiv
Show abstract

Drug-induced arrhythmias, particularly Torsades de Pointes (TdP), pose a significant risk to patient safety and can sometimes have life-threatening outcomes. They remain a major concern in drug development and regulation. Machine learning (ML) has become a powerful tool for analyzing complex biological and chemical datasets, enabling researchers to identify subtle patterns that differentiate safe compounds from those likely to cause dangerous cardiac effects. However, most existing in silico approaches do not sufficiently incorporate biological elements, relying heavily on chemical and structural properties or on computationally expensive simulations. Here, we introduce BioMADE, a novel ML framework that harnesses small-molecule-protein activity profiles from publicly available datasets to predict TdP risk without requiring exhaustive mechanistic annotation. Activity data from ChEMBL were used to train individual models for each gene, which predict activity values for any given compound. A curated set of arrhythmia-relevant genes was then used to construct a latent biological embedding (BioMADE embedding) for each molecule. We validated the performance of these features in distinguishing biological elements such as ATC3 class, showing superior classification performance compared with representations such as Molformer (lacks biological information) and MACCS (limited chemical properties) (0.85 AUROC vs 0.81 and 0.73, respectively). BioMADE representations served as input to a support vector machine classifier to discriminate TdP-inducing drugs from safe compounds. BioMADE achieved an AUROC of 0.89 in internal validation, indicating strong predictive performance. Against state-of-the-art models such as ADMEThyst, BioMADE achieved an AUROC of 0.74 on ADMEThysts validation set (vs. 0.72 for ADMEThyst). When we combined both approaches, the AUROC reached 0.77. These results demonstrate that BioMADE provides a scalable, biology-informed, and generalizable approach for predicting drug-induced toxicities. By integrating protein activity profiles into toxicology modeling, our framework highlights the critical role of human biology in adverse drug reaction prediction, an aspect often overshadowed by purely chemical or structural descriptors.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.5%
12.3%
2
Genome Medicine
154 papers in training set
Top 0.7%
7.2%
3
Advanced Science
249 papers in training set
Top 3%
6.4%
4
Bioinformatics
1061 papers in training set
Top 4%
6.3%
5
Scientific Reports
3102 papers in training set
Top 24%
4.8%
6
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.1%
4.8%
7
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.3%
8
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
50% of probability mass above
10
Nature Communications
4913 papers in training set
Top 40%
3.6%
11
Journal of Cheminformatics
25 papers in training set
Top 0.2%
3.2%
12
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
13
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.7%
14
Patterns
70 papers in training set
Top 0.4%
2.4%
15
PLOS ONE
4510 papers in training set
Top 46%
2.4%
16
Bioinformatics Advances
184 papers in training set
Top 3%
1.8%
17
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
18
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.7%
19
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.5%
20
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
21
iScience
1063 papers in training set
Top 19%
1.3%
22
eBioMedicine
130 papers in training set
Top 3%
0.9%
23
BioData Mining
15 papers in training set
Top 0.7%
0.9%
24
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.9%
25
Communications Biology
886 papers in training set
Top 19%
0.9%
26
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
27
Nucleic Acids Research
1128 papers in training set
Top 18%
0.7%
28
Communications Chemistry
39 papers in training set
Top 1%
0.7%
29
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
30
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.6%