Harnessing AI and social media to understand real-world patient experiences in systemic lupus erythematosus
Yang, S.; Hawryluk, C.; Liu, J.; Eckert, N.; Otoo, J.; Vina, E. R.; Yao, L.
Show abstract
ObjectiveTo apply large language models (LLMs) to Reddit posts referencing systemic lupus erythematosus (SLE) to identify patient-expressed unmet medical needs, symptom experiences, and healthcare challenges, demonstrating how AI-enabled social media listening complements traditional patient-experience research. MethodsWe extracted 4,633 posts from ten SLE-related or health-focused Reddit communities using the public Reddit API (October-November 2025). After removing duplicates, promotional content, and posts with insufficient information, 2,603 posts remained. A thematic codebook was developed through manual review of 300 posts and iteratively refined. Two LLMs (Gemini 3.0 and GPT-5.2) were evaluated for automated thematic labeling using percent agreement, Cohens {kappa}, and a human-annotated reference set (n=100). The best-performing model was used to quantify theme prevalence, followed by qualitative review of representative narratives. ResultsGPT-5.2 demonstrated higher performance (F1=0.844) than Gemini 3.0 (F1=0.811), with substantial inter-model agreement across main themes (mean {kappa}=0.71). Posts reflected multidimensional experiences. The most frequent subtheme was Advice Seeking (84.1%), followed by Emotional Coping (55.6%). Common symptom-related themes included Pain (37.2%), Other Symptom Presentations (37.6%), Fatigue (24.7%), and Acute or Worsening Flares (30.2%). Diagnostic uncertainty was prominent, including confusion about laboratory results (24.0%) and emotional impact of uncertainty (33.0%). Qualitative review highlighted emotional distress, reliance on peer communities for interpretation of symptoms and labs, and difficulty managing complex treatment regimens. ConclusionLLM-enabled social media listening offers a scalable method for synthesizing large volumes of unstructured patient narratives, providing timely insights into lived experiences and unmet needs among individuals discussing lupus online. Findings align with established qualitative literature while highlighting persistent gaps in patient education, communication, and care coordination. This analytical framework can be applied across disease areas to support patient-centered care, measurement development, and evidence generation relevant to therapeutic and health-services research. What is already known on this topicO_LIPeople living with systemic lupus erythematosus (SLE) experience substantial unmet needs related to diagnostic uncertainty, symptom burden, emotional distress, medication challenges, and healthcare system barriers. C_LIO_LITraditional qualitative methods (e.g., interviews, focus groups, surveys) capture valuable patient perspectives but are limited by small sample sizes, recall bias, and restricted question frameworks. C_LIO_LISocial media listening has emerged as a promising way to collect real-time patient insights, and recent regulatory guidance acknowledges its value as patient experience data. However, systematic, scalable analysis of large patient-generated datasets has historically been constrained by analytic burden and variability. C_LI What this study addsO_LIThis study is among the first to apply state-of-the-art large language models (LLMs) to a large corpus of SLE-related social media posts, enabling scalable thematic analysis of thousands of patient narratives. C_LIO_LIIt provides a validated methodological framework for using dual-LLM agreement, human-annotated references, and performance benchmarking (precision, recall, F1) to ensure reliability in automated thematic labeling. C_LIO_LIFindings reveal a multidimensional patient burden consistent with prior studies while uncovering persistent gaps in patient education, confusion around laboratory testing, care coordination challenges, and heavy reliance on peer communities for advice. C_LIO_LIThe approach demonstrates that LLM-enabled social media listening can generate timely, granular, patient-prioritized insights at a scale unattainable by traditional methods. C_LI How this study might affect research, practice, or policyO_LIResearch: Establishes a reproducible, scalable framework for integrating LLM-based thematic analysis into patient-focused evidence generation, accelerating insight extraction from large unstructured datasets across disease areas. C_LIO_LIClinical practice: Highlights actionable gaps in patient education, communication, and care coordination, informing interventions to improve clinical encounters, shared decision-making, and symptom management support. C_LIO_LIPolicy and regulatory science: Demonstrates how social media-derived patient experience data, when paired with rigorous quality controls, can complement formal qualitative studies and support patient-focused drug development, measurement development, and health-services planning. C_LI
Matching journals
The top 3 journals account for 50% of the predicted probability mass.