Back

Using Large Language Models for sentiment analysis of health-related social media data: empirical evaluation and practical tips

He, L.; Omranian, S.; McRoy, S.; Zheng, K.

2024-03-20 health informatics
10.1101/2024.03.19.24304544 medRxiv
Show abstract

Health-related social media data generated by patients and the public provide valuable insights into patient experiences and opinions toward health issues such as vaccination and medical treatments. Using Natural Language Processing (NLP) methods to analyze such data, however, often requires high-quality annotations that are difficult to obtain. The recent emergence of Large Language Models (LLMs) such as the Generative Pre-trained Transformers (GPTs) has shown promising performance on a variety of NLP tasks in the health domain with little to no annotated data. However, their potential in analyzing health-related social media data remains underexplored. In this paper, we report empirical evaluations of LLMs (GPT-3.5-Turbo, FLAN-T5, and BERT-based models) on a common NLP task of health-related social media data: sentiment analysis for identifying opinions toward health issues. We explored how different prompting and fine-tuning strategies affect the performance of LLMs on social media datasets across diverse health topics, including Healthcare Reform, vaccination, mask wearing, and healthcare service quality. We found that LLMs outperformed VADER, a widely used off-the-shelf sentiment analysis tool, but are far from being able to produce accurate sentiment labels. However, their performance can be improved by data-specific prompts with information about the context, task, and targets. The highest performing LLMs are BERT-based models that were fine-tuned on aggregated data. We provided practical tips for researchers to use LLMs on health-related social media data for optimal outcomes. We also discuss future work needed to continue to improve the performance of LLMs for analyzing health-related social media data with minimal annotations.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
33.5%
2
Journal of Medical Internet Research
85 papers in training set
Top 0.5%
8.5%
3
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.5%
4.9%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
4.9%
50% of probability mass above
5
International Journal of Medical Informatics
25 papers in training set
Top 0.2%
4.9%
6
npj Digital Medicine
97 papers in training set
Top 1.0%
4.4%
7
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.4%
4.0%
8
JAMIA Open
37 papers in training set
Top 0.5%
3.1%
9
Scientific Reports
3102 papers in training set
Top 41%
3.1%
10
Frontiers in Digital Health
20 papers in training set
Top 0.3%
2.9%
11
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
2.6%
12
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
13
Patterns
70 papers in training set
Top 1%
1.5%
14
Bioinformatics
1061 papers in training set
Top 8%
1.3%
15
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.2%
16
BMC Bioinformatics
383 papers in training set
Top 6%
1.0%
17
Biology Methods and Protocols
53 papers in training set
Top 2%
0.9%
18
JMIR Medical Informatics
17 papers in training set
Top 1%
0.9%
19
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.9%
20
Healthcare
16 papers in training set
Top 1%
0.9%
21
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.9%
22
GigaScience
172 papers in training set
Top 3%
0.8%
23
Cureus
67 papers in training set
Top 5%
0.8%
24
BioData Mining
15 papers in training set
Top 0.9%
0.8%
25
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.9%
0.8%
26
PLOS ONE
4510 papers in training set
Top 67%
0.8%