Back

Harnessing AI and social media to understand real-world patient experiences in systemic lupus erythematosus

Yang, S.; Hawryluk, C.; Liu, J.; Eckert, N.; Otoo, J.; Vina, E. R.; Yao, L.

2026-02-22 rheumatology
10.64898/2026.02.20.26346724 medRxiv
Show abstract

ObjectiveTo apply large language models (LLMs) to Reddit posts referencing systemic lupus erythematosus (SLE) to identify patient-expressed unmet medical needs, symptom experiences, and healthcare challenges, demonstrating how AI-enabled social media listening complements traditional patient-experience research. MethodsWe extracted 4,633 posts from ten SLE-related or health-focused Reddit communities using the public Reddit API (October-November 2025). After removing duplicates, promotional content, and posts with insufficient information, 2,603 posts remained. A thematic codebook was developed through manual review of 300 posts and iteratively refined. Two LLMs (Gemini 3.0 and GPT-5.2) were evaluated for automated thematic labeling using percent agreement, Cohens {kappa}, and a human-annotated reference set (n=100). The best-performing model was used to quantify theme prevalence, followed by qualitative review of representative narratives. ResultsGPT-5.2 demonstrated higher performance (F1=0.844) than Gemini 3.0 (F1=0.811), with substantial inter-model agreement across main themes (mean {kappa}=0.71). Posts reflected multidimensional experiences. The most frequent subtheme was Advice Seeking (84.1%), followed by Emotional Coping (55.6%). Common symptom-related themes included Pain (37.2%), Other Symptom Presentations (37.6%), Fatigue (24.7%), and Acute or Worsening Flares (30.2%). Diagnostic uncertainty was prominent, including confusion about laboratory results (24.0%) and emotional impact of uncertainty (33.0%). Qualitative review highlighted emotional distress, reliance on peer communities for interpretation of symptoms and labs, and difficulty managing complex treatment regimens. ConclusionLLM-enabled social media listening offers a scalable method for synthesizing large volumes of unstructured patient narratives, providing timely insights into lived experiences and unmet needs among individuals discussing lupus online. Findings align with established qualitative literature while highlighting persistent gaps in patient education, communication, and care coordination. This analytical framework can be applied across disease areas to support patient-centered care, measurement development, and evidence generation relevant to therapeutic and health-services research. What is already known on this topicO_LIPeople living with systemic lupus erythematosus (SLE) experience substantial unmet needs related to diagnostic uncertainty, symptom burden, emotional distress, medication challenges, and healthcare system barriers. C_LIO_LITraditional qualitative methods (e.g., interviews, focus groups, surveys) capture valuable patient perspectives but are limited by small sample sizes, recall bias, and restricted question frameworks. C_LIO_LISocial media listening has emerged as a promising way to collect real-time patient insights, and recent regulatory guidance acknowledges its value as patient experience data. However, systematic, scalable analysis of large patient-generated datasets has historically been constrained by analytic burden and variability. C_LI What this study addsO_LIThis study is among the first to apply state-of-the-art large language models (LLMs) to a large corpus of SLE-related social media posts, enabling scalable thematic analysis of thousands of patient narratives. C_LIO_LIIt provides a validated methodological framework for using dual-LLM agreement, human-annotated references, and performance benchmarking (precision, recall, F1) to ensure reliability in automated thematic labeling. C_LIO_LIFindings reveal a multidimensional patient burden consistent with prior studies while uncovering persistent gaps in patient education, confusion around laboratory testing, care coordination challenges, and heavy reliance on peer communities for advice. C_LIO_LIThe approach demonstrates that LLM-enabled social media listening can generate timely, granular, patient-prioritized insights at a scale unattainable by traditional methods. C_LI How this study might affect research, practice, or policyO_LIResearch: Establishes a reproducible, scalable framework for integrating LLM-based thematic analysis into patient-focused evidence generation, accelerating insight extraction from large unstructured datasets across disease areas. C_LIO_LIClinical practice: Highlights actionable gaps in patient education, communication, and care coordination, informing interventions to improve clinical encounters, shared decision-making, and symptom management support. C_LIO_LIPolicy and regulatory science: Demonstrates how social media-derived patient experience data, when paired with rigorous quality controls, can complement formal qualitative studies and support patient-focused drug development, measurement development, and health-services planning. C_LI

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.2%
23.3%
2
Patterns
70 papers in training set
Top 0.1%
23.3%
3
Genome Medicine
154 papers in training set
Top 1%
5.0%
50% of probability mass above
4
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.7%
5
PLOS ONE
4510 papers in training set
Top 37%
3.7%
6
Rheumatology
21 papers in training set
Top 0.2%
3.2%
7
BMJ Open
554 papers in training set
Top 6%
3.2%
8
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.7%
9
Frontiers in Public Health
140 papers in training set
Top 3%
2.5%
10
PLOS Digital Health
91 papers in training set
Top 1%
2.2%
11
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
2.2%
12
Scientific Reports
3102 papers in training set
Top 52%
2.0%
13
Frontiers in Digital Health
20 papers in training set
Top 0.7%
1.5%
14
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.9%
15
Wellcome Open Research
57 papers in training set
Top 2%
0.9%
16
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.9%
17
iScience
1063 papers in training set
Top 28%
0.8%
18
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.8%
19
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.4%
0.8%
20
Metabolites
50 papers in training set
Top 1%
0.8%
21
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.7%
0.8%
22
Healthcare
16 papers in training set
Top 2%
0.8%
23
International Journal of Environmental Research and Public Health
124 papers in training set
Top 7%
0.8%
24
eBioMedicine
130 papers in training set
Top 5%
0.7%
25
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.5%
26
Frontiers in Medicine
113 papers in training set
Top 8%
0.5%