Back

OnSIDES (ON-label SIDE effectS resource) Database : Extracting Adverse Drug Events from Drug Labels using Natural Language Processing Models

Tanaka, Y.; Chen, H. Y.; Belloni, P.; Gisladottir, U.; Kefeli, J.; Patterson, J.; Srinivasan, A.; Zeitz, M.; Sirdeshmukh, G.; Berkowitz, J.; LaRow Brown, K.; Tatonetti, N. P.

2024-03-24 pharmacology and therapeutics
10.1101/2024.03.22.24304724 medRxiv
Show abstract

Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. However, few machine-readable databases of ADEs exist, limiting the opportunity to study drug safety on a broader, systematic scale. Recent advances in Natural Language Processing methods, such as BERT models, present an opportunity to accurately extract relevant information from unstructured biomedical text. As such, we fine-tuned a PubMedBERT model to extract ADE terms from descriptive text in FDA Structured Product Labels for prescription drugs. With this model, we achieve an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting ADEs from the labels "Adverse Reactions". We further utilize this method to extract serious ADEs from labels "Boxed Warnings", and ADEs specifically noted for pediatric patients. Here, we present OnSIDES (ON-label SIDE effectS resource), a compiled, computable database of drug-ADE pairs generated with this method. OnSIDES contains more than 3.6 million drug-ADE pairs for 3,233 unique drug ingredient combinations extracted from 47,211 labels. Additionally, we expand this method to extract ADEs from drug labels of other major nations/regions - Japan, the UK, and the EU - to build a complementary OnSIDES-INTL database. To present potential applications, we used OnSIDES to predict novel drug targets and indications, analyze enrichment of ADEs across drug classes, and predict novel ADEs from chemical compound structures. We conclude that OnSIDES can be utilized as a comprehensive resource to study and enhance drug safety. One Sentence SummaryOnSIDES is a large, comprehensive database of adverse drug events extracted from drug labels using natural language processing methods.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BioData Mining
15 papers in training set
Top 0.1%
22.2%
2
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
12.2%
3
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.1%
6.7%
4
Clinical and Translational Science
21 papers in training set
Top 0.1%
6.3%
5
Frontiers in Pharmacology
100 papers in training set
Top 0.7%
3.9%
50% of probability mass above
6
Bioinformatics
1061 papers in training set
Top 5%
3.9%
7
Scientific Reports
3102 papers in training set
Top 36%
3.6%
8
BMC Bioinformatics
383 papers in training set
Top 3%
3.5%
9
PLOS ONE
4510 papers in training set
Top 40%
3.5%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.8%
11
Nature
575 papers in training set
Top 8%
2.6%
12
Scientific Data
174 papers in training set
Top 0.8%
2.1%
13
eLife
5422 papers in training set
Top 44%
1.6%
14
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.3%
15
npj Digital Medicine
97 papers in training set
Top 3%
1.2%
16
Heliyon
146 papers in training set
Top 4%
1.2%
17
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.4%
0.9%
18
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
20
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
21
Database
51 papers in training set
Top 0.9%
0.8%
22
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
23
JMIRx Med
31 papers in training set
Top 2%
0.7%
24
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%
25
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.7%
26
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
27
npj Genomic Medicine
33 papers in training set
Top 1%
0.6%
28
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.6%
29
Nature Communications
4913 papers in training set
Top 66%
0.6%