Back

Assessing Large Language Model Utility and Limitations in Diabetes Education: A Cross-Sectional Study of Patient Interactions and Specialist Evaluations

Mustafa, G.; Ong, J.; Shaikh, M. Z.; Askari, S.; Anjum, S.; Adhi, M. I.; Memon, A. S.; Abdul Rauf, M. U.; Rizvi, A.; Iqbal, I.; Basit, S.; Khan, M. F.; Masood, M. Q.

2025-06-24 endocrinology

10.1101/2025.06.24.25329401 medRxiv

Show abstract

ObjectivesTo assess the value of an AI-powered conversational agent in supporting diabetes self-management among adults with diabetic retinopathy and limited educational backgrounds. MethodsIn this cross-sectional study, 51 adults with Type{square}II diabetes and diabetic retinopathy participated in moderated Q-and-A sessions with ChatGPT. Non-English-speaking and visually impaired participants interacted through trained human support. Each question- response pair was assigned to one of seven thematic categories and independently evaluated by endocrinologists and ophthalmologists using the 3C{square}+{square}2 framework (clarity, completeness, correctness, safety, recency). Inter-rater reliability was calculated with intraclass correlation coefficients (ICC) and Fleiss{square}Kappa. ResultsThe cohort generated 137 questions, and 98{square}% of the conversational agents answers were judged informative and empathetic. Endocrinologists awarded high mean scores for clarity (4.66/5) and completeness (4.52/5) but showed limited agreement (ICC{square}={square}0.13 and{square}0.27). Ophthalmologists gave lower mean scores for clarity (3.09/5) and completeness (2.94/5) yet demonstrated stronger agreement (ICC{square}={square}0.70 and{square}0.52). Reviewers detected occasional inaccuracies and hallucinations. Participants valued the agent for sensitive discussions but deferred to physicians for complex medical issues. ConclusionsAn AI conversational agent can help bridge communication gaps in diabetes care by providing accurate, easy-to-understand answers for individuals facing language, literacy, or vision-related barriers. Nonetheless, hallucinations and variable specialist ratings underscore the need for continuous physician oversight and iterative refinement of AI outputs. Practice implicationsIntroducing conversational AI into resource-limited clinics could enhance patient education and engagement, provided that clinicians review and contextualise the advice to ensure safety, accuracy, and personalisation. Future development should prioritise reducing hallucinations and bolstering domain-specific reliability so the tool complements, rather than replaces, professional care.

Assessing Large Language Model Utility and Limitations in Diabetes Education: A Cross-Sectional Study of Patient Interactions and Specialist Evaluations

Matching journals