Back

Detecting Medication Mentions in Social Media Data Using Large Language Models

Lopez-Garcia, G.; Xu, D.; Gonzalez-Hernandez, G.

2025-05-18 health informatics
10.1101/2025.05.16.25327791 medRxiv
Show abstract

The automatic extraction of medication mentions from social media data is critical for pharmacovigilance and public health monitoring. In this study, we present an end-to-end generative approach based on instruction-tuned large language models (LLMs) for medication mention extraction from Twitter. Reformulating the task as a text-to-text generation problem, our models achieve state-of-the-art results on both fine-grained span extraction and coarse-grained tweet-level classification, surpassing traditional sequence labeling baselines and previous best-performing systems. We demonstrate that fine-tuning Flan-T5 models enables efficient and accurate extraction while simplifying the architecture by eliminating complex multi-stage pipelines. Additionally, we show that lexicon-based filtering further improves performance by reducing false positives. Our models are publicly available, providing high-performing and efficient tools for large-scale pharmacological analysis of social media data.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
26.6%
2
Med
38 papers in training set
Top 0.1%
8.6%
3
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
7.0%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.6%
4.1%
5
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.4%
4.1%
50% of probability mass above
6
JAMIA Open
37 papers in training set
Top 0.4%
3.8%
7
Scientific Reports
3102 papers in training set
Top 33%
3.8%
8
Nature Biomedical Engineering
42 papers in training set
Top 0.3%
3.2%
9
Bioinformatics
1061 papers in training set
Top 6%
3.0%
10
Nature Communications
4913 papers in training set
Top 46%
2.1%
11
Science Translational Medicine
111 papers in training set
Top 2%
1.9%
12
PLOS ONE
4510 papers in training set
Top 51%
1.8%
13
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.7%
14
Communications Medicine
85 papers in training set
Top 0.2%
1.7%
15
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.5%
16
Patterns
70 papers in training set
Top 1%
1.5%
17
Advanced Science
249 papers in training set
Top 12%
1.5%
18
Nature Machine Intelligence
61 papers in training set
Top 2%
1.5%
19
BMC Bioinformatics
383 papers in training set
Top 6%
1.1%
20
Nature Computational Science
50 papers in training set
Top 1%
1.0%
21
iScience
1063 papers in training set
Top 26%
0.9%
22
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.9%
23
Frontiers in Digital Health
20 papers in training set
Top 1%
0.8%
24
eBioMedicine
130 papers in training set
Top 4%
0.8%
25
Science Advances
1098 papers in training set
Top 29%
0.8%
26
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
27
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.9%
0.7%
28
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
29
Communications Biology
886 papers in training set
Top 28%
0.7%
30
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%