Back

Learning Chirality-Aware Representations to Predict Drug Side Effect Frequencies

Galeano, A.; Dutra, I.; Ferreyra, S.; Paccanaro, A.

2026-05-18 bioinformatics
10.64898/2026.05.14.725209 bioRxiv
Show abstract

Ab initio prediction of side effect frequencies is important for assessing the risk-benefit profile of drugs and for identifying potential adverse effects early in development. A key challenge is chirality: many drugs exist as enantiomers, pairs of molecules with the same atoms and bond connectivity but different three-dimensional arrangements. Although chemically similar, enantiomers can interact differently with biological targets and therefore exhibit distinct efficacy and adverse-effect profiles. Here we introduce F2S (Features to Signatures), a method to predict the frequencies of drug side effects while explicitly accounting for chirality. Drug representations are learned directly from chemical structure using a directed-bond message-passing graph neural network that captures stereochemical configurations. Side effect representations are derived from curated textual descriptions encoded with a frozen PubMedBERT model. Side effect frequencies are predicted from the dot product between drug and side effect signatures together with biases for drugs and side effects. We evaluated F2S extensively across multiple settings, including cold-start and warm-start prediction, prospective evaluation, and scenarios controlling for chemical similarity between training and test drugs. Across these evaluations, F2S achieves performance comparable to state-of-the-art methods for general side-effect frequency prediction while producing fewer false positives and substantially improves the prediction of frequency differences between enantiomer pairs. Finally, F2S learns compact 10-dimensional signatures that support interpretability: drug signatures reflect therapeutic class and shared targets, side-effect signatures capture phenotype similarity, and the learned bias terms correlate with the popularity of drugs and side effects.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
17.4%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.8%
6.8%
3
Nature Communications
4913 papers in training set
Top 29%
6.3%
4
Journal of Cheminformatics
25 papers in training set
Top 0.1%
4.8%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.8%
6
Cell Systems
167 papers in training set
Top 3%
4.1%
7
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 18%
3.9%
8
Nature Machine Intelligence
61 papers in training set
Top 1.0%
3.6%
50% of probability mass above
9
Advanced Science
249 papers in training set
Top 7%
3.1%
10
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
11
Bioinformatics Advances
184 papers in training set
Top 2%
2.6%
12
Genome Medicine
154 papers in training set
Top 4%
2.1%
13
Nature Methods
336 papers in training set
Top 4%
1.9%
14
Scientific Reports
3102 papers in training set
Top 53%
1.9%
15
PLOS ONE
4510 papers in training set
Top 51%
1.8%
16
Nature Biotechnology
147 papers in training set
Top 5%
1.7%
17
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.4%
1.7%
18
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.7%
19
iScience
1063 papers in training set
Top 16%
1.7%
20
Patterns
70 papers in training set
Top 2%
1.2%
21
BMC Bioinformatics
383 papers in training set
Top 5%
1.2%
22
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
23
Science Advances
1098 papers in training set
Top 27%
0.9%
24
Communications Chemistry
39 papers in training set
Top 0.8%
0.9%
25
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
26
Communications Biology
886 papers in training set
Top 21%
0.8%
27
eLife
5422 papers in training set
Top 58%
0.7%
28
Chemical Science
71 papers in training set
Top 2%
0.7%
29
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.7%
30
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.3%
0.7%