Back

A Blinded Comparative Evaluation of Clinical and AI-Generated Responses to Otologic Patient Queries

Akinniyi, S.; Jain-Poster, K.; Evangelista, E.; Yoshikawa, N.; Rivero, A.

2026-04-15 otolaryngology
10.64898/2026.04.14.26350677 medRxiv
Show abstract

ObjectiveThe objective of this study is to assess the quality, empathy, and readability of large language model (LLM) responses regarding otologic questions from patients as they compare to verified physician responses in other patient-driven forums. This study aims to predict the potential utility of LLMs in patient-centered communication. Study DesignComparative study SettingsInternet MethodsA sample of 49 otology-related questions posted on Reddit r/AskDocs1 between January 2020 and June 2025 were selected using search terms including "hearing loss," "ear infection," "tinnitus," "ear pain," and "vertigo." Posts were retrieved using Reddits "Top" filter. Each question was answered by a verified doctor on Reddit and three AI LLMs (ChatGPT-4o, ClaudeAI, Google Gemini). Responses were scored by five evaluators. ResultsCommon otologic concerns posed in patient questions were otalgia (38.7%), vertigo (28.6%), tinnitus (24.5%), hearing loss (22.4%), and aural fullness (20.4%). LLM responses were longer than physician responses (mean 145 vs 67 words; p < .05) and rated higher in quality (10.95 vs 9.58), empathy (7.26 vs 5.18), and readability (4.00 vs 3.73); (all p < .05). Evaluators correctly identified AI versus physician responses in 89.4% of cases with higher sensitivity for detecting physician responses (93.5%). By Flesch-Kincaid grade level, ChatGPT produced the most readable content (mean 7.25), while ClaudeAI responses were more complex (11.86; p < .05). ConclusionLLM responses received higher ratings in quality, empathy, and readability than those of physicians in response to a variety of otologic concerns. When appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 9%
19.0%
2
npj Digital Medicine
97 papers in training set
Top 0.3%
17.9%
3
Vaccine
189 papers in training set
Top 0.4%
8.6%
4
Ear & Hearing
15 papers in training set
Top 0.1%
4.9%
50% of probability mass above
5
Journal of Clinical Medicine
91 papers in training set
Top 1%
3.7%
6
Scientific Reports
3102 papers in training set
Top 35%
3.7%
7
iScience
1063 papers in training set
Top 6%
3.3%
8
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.8%
9
Ophthalmology Science
20 papers in training set
Top 0.1%
2.1%
10
Applied Sciences
24 papers in training set
Top 0.2%
2.1%
11
Frontiers in Neurology
91 papers in training set
Top 2%
1.9%
12
Brain Sciences
52 papers in training set
Top 0.6%
1.7%
13
Frontiers in Oncology
95 papers in training set
Top 2%
1.7%
14
Cureus
67 papers in training set
Top 3%
1.7%
15
British Journal of Ophthalmology
14 papers in training set
Top 0.2%
1.7%
16
The Journal of Pain
26 papers in training set
Top 0.4%
1.7%
17
JMIR Formative Research
32 papers in training set
Top 1%
1.2%
18
Communications Medicine
85 papers in training set
Top 0.5%
1.2%
19
Trends in Hearing
12 papers in training set
Top 0.1%
1.0%
20
ERJ Open Research
44 papers in training set
Top 0.7%
0.9%
21
American Journal of Respiratory and Critical Care Medicine
39 papers in training set
Top 0.7%
0.9%
22
Nature Communications
4913 papers in training set
Top 60%
0.8%
23
Cells
232 papers in training set
Top 7%
0.7%
24
ACS Chemical Neuroscience
60 papers in training set
Top 3%
0.7%
25
Hearing Research
49 papers in training set
Top 0.3%
0.7%
26
Frontiers in Immunology
586 papers in training set
Top 9%
0.7%