Back

AI-Generated Clinical Summaries: Errors and Susceptibility to Speech and Speaker Variability

Draper, T. C.; Leake, J.; Cox, T.; Lamb-Riddell, K.; Johns, B. E.; McCormick, J.; Trowell, S.; Kiely, J.; Luxton, R.

2025-10-30 health informatics
10.1101/2025.10.29.25339041 medRxiv
Show abstract

Summary BoxO_ST_ABSWhat is already known on this topicC_ST_ABSO_LIClinical AI Scribe outputs can contain errors, and the impact of human factors (e.g. communication style, accents, speech impairments) in clinical contexts remains under-characterised. C_LI What this study addsO_LIIn controlled simulations, patient personality and accent did not significantly alter total CAIS errors, with omissions predominating and hallucinations/inaccuracies remaining low. C_LIO_LISpeech-impairment effects were highly varied, with near-perfect recognition for cleft palate and vowel disorders, whereas phonological impairment substantially reduced accuracy. C_LI How this study might affect research, practice or policyO_LISupports clinician-in-the-loop deployment with local validation across representative accents and impairment profiles, prioritising detection of clinically critical errors. C_LIO_LIRoutine governance should include subgroup performance reporting (accents, impairments) and ongoing audit of error rates. C_LI ObjectiveThe study aims to evaluate whether variability in patients communication style (personality, international English accents, and speech impairments) affects the accuracy of a Clinical AI Scribe (CAIS), and to identify where performance degrades. Method and AnalysisWe conducted simulated primary-care consultations in a purpose-built lab using trained actors. To investigate personality types, four scenarios were enacted, each with five patient-personality types. For accents, human-verified transcripts of consultations were used to generate all doctor/patient combinations of seven different accents (including a synthetic reference voice) across five scenarios. The CAIS produced SOAP-structured summaries that were compared with the transcripts. Errors were classified as omissions, factual inaccuracies, or hallucinations. For speech impairments, public recordings representing five profiles were transcribed and word-recognition accuracy was calculated. ResultsPersonality types showed no statistically significant differences in errors (all p>0.05). Extraversion had the highest total errors (median 3.5), while conscientiousness and agreeableness were lower (1.5 and 2.0, respectively). Across accents, both pairwise tests and group comparisons were non-significant for both patient and doctor voices (patients: p=0.851; doctors: p=0.98). Omissions predominated, with low rates of hallucinations and factual inaccuracies. Omissions were slightly higher for Chinese- and Indian-accented doctors (both medians 3.0). In contrast, speech impairments differed: cleft palate and vowel disorders were near-perfect, whereas phonological impairment markedly reduced recognition (p<0.001). ConclusionsUnder controlled conditions, CAIS performance was broadly stable across communication styles and most accents but remained vulnerable to specific speech characteristics, particularly phonological impairment. Future evaluations using real-world, multi-speaker clinical audio are needed to confirm performance.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 0.1%
17.4%
2
Frontiers in Digital Health
20 papers in training set
Top 0.1%
14.3%
3
BMJ Open
554 papers in training set
Top 2%
10.0%
4
PLOS ONE
4510 papers in training set
Top 24%
7.1%
5
Journal of Medical Internet Research
85 papers in training set
Top 1.0%
4.8%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 28%
4.3%
7
JMIR Formative Research
32 papers in training set
Top 0.4%
3.6%
8
PLOS Digital Health
91 papers in training set
Top 1%
2.1%
9
BMC Research Notes
29 papers in training set
Top 0.1%
1.7%
10
BMJ Health & Care Informatics
13 papers in training set
Top 0.5%
1.5%
11
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.5%
12
JAMA Pediatrics
10 papers in training set
Top 0.1%
1.5%
13
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.4%
1.3%
14
Healthcare
16 papers in training set
Top 1%
1.2%
15
DIGITAL HEALTH
12 papers in training set
Top 0.5%
1.2%
16
Journal of Alzheimer’s Disease
39 papers in training set
Top 1.0%
0.9%
17
Emergency Medicine Journal
20 papers in training set
Top 0.5%
0.9%
18
Hearing Research
49 papers in training set
Top 0.3%
0.9%
19
Journal of Clinical Medicine
91 papers in training set
Top 5%
0.9%
20
Brain Sciences
52 papers in training set
Top 2%
0.9%
21
Journal of NeuroEngineering and Rehabilitation
28 papers in training set
Top 1.0%
0.7%
22
Trials
25 papers in training set
Top 2%
0.7%
23
Frontiers in Neurology
91 papers in training set
Top 5%
0.7%
24
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.7%
25
eClinicalMedicine
55 papers in training set
Top 2%
0.7%
26
Journal of General Internal Medicine
20 papers in training set
Top 1%
0.6%
27
iScience
1063 papers in training set
Top 38%
0.6%