Back

Assessing Large Language Model Utility and Limitations in Diabetes Education: A Cross-Sectional Study of Patient Interactions and Specialist Evaluations

Mustafa, G.; Ong, J.; Shaikh, M. Z.; Askari, S.; Anjum, S.; Adhi, M. I.; Memon, A. S.; Abdul Rauf, M. U.; Rizvi, A.; Iqbal, I.; Basit, S.; Khan, M. F.; Masood, M. Q.

2025-06-24 endocrinology
10.1101/2025.06.24.25329401 medRxiv
Show abstract

ObjectivesTo assess the value of an AI-powered conversational agent in supporting diabetes self-management among adults with diabetic retinopathy and limited educational backgrounds. MethodsIn this cross-sectional study, 51 adults with Type{square}II diabetes and diabetic retinopathy participated in moderated Q-and-A sessions with ChatGPT. Non-English-speaking and visually impaired participants interacted through trained human support. Each question- response pair was assigned to one of seven thematic categories and independently evaluated by endocrinologists and ophthalmologists using the 3C{square}+{square}2 framework (clarity, completeness, correctness, safety, recency). Inter-rater reliability was calculated with intraclass correlation coefficients (ICC) and Fleiss{square}Kappa. ResultsThe cohort generated 137 questions, and 98{square}% of the conversational agents answers were judged informative and empathetic. Endocrinologists awarded high mean scores for clarity (4.66/5) and completeness (4.52/5) but showed limited agreement (ICC{square}={square}0.13 and{square}0.27). Ophthalmologists gave lower mean scores for clarity (3.09/5) and completeness (2.94/5) yet demonstrated stronger agreement (ICC{square}={square}0.70 and{square}0.52). Reviewers detected occasional inaccuracies and hallucinations. Participants valued the agent for sensitive discussions but deferred to physicians for complex medical issues. ConclusionsAn AI conversational agent can help bridge communication gaps in diabetes care by providing accurate, easy-to-understand answers for individuals facing language, literacy, or vision-related barriers. Nonetheless, hallucinations and variable specialist ratings underscore the need for continuous physician oversight and iterative refinement of AI outputs. Practice implicationsIntroducing conversational AI into resource-limited clinics could enhance patient education and engagement, provided that clinicians review and contextualise the advice to ensure safety, accuracy, and personalisation. Future development should prioritise reducing hallucinations and bolstering domain-specific reliability so the tool complements, rather than replaces, professional care.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of General Internal Medicine
20 papers in training set
Top 0.1%
22.6%
2
BMJ Open
554 papers in training set
Top 2%
8.4%
3
Journal of Medical Internet Research
85 papers in training set
Top 0.8%
6.3%
4
BMJ Open Diabetes Research & Care
15 papers in training set
Top 0.2%
4.9%
5
Pilot and Feasibility Studies
12 papers in training set
Top 0.1%
4.9%
6
PLOS Digital Health
91 papers in training set
Top 0.5%
4.9%
50% of probability mass above
7
npj Digital Medicine
97 papers in training set
Top 1.0%
4.3%
8
PLOS ONE
4510 papers in training set
Top 35%
4.2%
9
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.6%
10
British Journal of General Practice
22 papers in training set
Top 0.1%
3.6%
11
JMIR Public Health and Surveillance
45 papers in training set
Top 1%
2.4%
12
JAMIA Open
37 papers in training set
Top 0.7%
1.9%
13
Scientific Reports
3102 papers in training set
Top 58%
1.7%
14
JMIR Medical Informatics
17 papers in training set
Top 0.9%
1.3%
15
The Journal of Pediatrics
15 papers in training set
Top 0.4%
1.3%
16
JMIR Formative Research
32 papers in training set
Top 1%
1.2%
17
DIGITAL HEALTH
12 papers in training set
Top 0.5%
1.2%
18
BMJ Health & Care Informatics
13 papers in training set
Top 0.6%
1.1%
19
Cancer Medicine
24 papers in training set
Top 1%
0.9%
20
Bioengineering
24 papers in training set
Top 1%
0.9%
21
Journal of Affective Disorders Reports
10 papers in training set
Top 0.2%
0.9%
22
Frontiers in Endocrinology
53 papers in training set
Top 2%
0.8%
23
JMIR Research Protocols
18 papers in training set
Top 1%
0.8%
24
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.5%
0.7%
25
iScience
1063 papers in training set
Top 37%
0.6%
26
Archives of Disease in Childhood
15 papers in training set
Top 0.5%
0.6%