A Blinded Comparative Evaluation of Clinical and AI-Generated Responses to Otologic Patient Queries

Akinniyi, S.; Jain-Poster, K.; Evangelista, E.; Yoshikawa, N.; Rivero, A.

2026-04-15 otolaryngology

10.64898/2026.04.14.26350677 medRxiv

Show abstract

ObjectiveThe objective of this study is to assess the quality, empathy, and readability of large language model (LLM) responses regarding otologic questions from patients as they compare to verified physician responses in other patient-driven forums. This study aims to predict the potential utility of LLMs in patient-centered communication. Study DesignComparative study SettingsInternet MethodsA sample of 49 otology-related questions posted on Reddit r/AskDocs1 between January 2020 and June 2025 were selected using search terms including "hearing loss," "ear infection," "tinnitus," "ear pain," and "vertigo." Posts were retrieved using Reddits "Top" filter. Each question was answered by a verified doctor on Reddit and three AI LLMs (ChatGPT-4o, ClaudeAI, Google Gemini). Responses were scored by five evaluators. ResultsCommon otologic concerns posed in patient questions were otalgia (38.7%), vertigo (28.6%), tinnitus (24.5%), hearing loss (22.4%), and aural fullness (20.4%). LLM responses were longer than physician responses (mean 145 vs 67 words; p < .05) and rated higher in quality (10.95 vs 9.58), empathy (7.26 vs 5.18), and readability (4.00 vs 3.73); (all p < .05). Evaluators correctly identified AI versus physician responses in 89.4% of cases with higher sensitivity for detecting physician responses (93.5%). By Flesch-Kincaid grade level, ChatGPT produced the most readable content (mean 7.25), while ClaudeAI responses were more complex (11.86; p < .05). ConclusionLLM responses received higher ratings in quality, empathy, and readability than those of physicians in response to a variety of otologic concerns. When appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care.

A Blinded Comparative Evaluation of Clinical and AI-Generated Responses to Otologic Patient Queries

Matching journals