Back

Accents Still Confuse AI: Systematic Errors in Speech Transcription and LLM-Based Remedies

Fatapour, Y.; Samaan, J. S.; Kuchi, A.; Srinivasan, A. P.; Fatapour, S.; Liu, H.; Berkowitz, J. S.; Tsang, K.; Zietz, M.; Friedrich, N.; Srinivasan, N.; Thangaratnam, S.; King, R.; Czarny, R.; Nguyen, T.; Yeo, Y. H. S.; Kim, H.; Lee, Y.-T.; Wongjarupong, N.; Abiri, A.; Tatonetti, N. P.

2025-09-02 health systems and quality improvement
10.1101/2025.08.29.25333548 medRxiv
Show abstract

Accurate and timely documentation in the electronic health record (EHR) is essential for delivering safe and effective patient care. AI-enabled medical tools powered by automatic speech recognition (ASR) offer to streamline this process by transcribing clinical conversations directly into structured notes. However, a critical challenge in deploying these technologies at scale is their variable performance across speakers with diverse accents, which leads to transcription inaccuracies, misinterpretation, and downstream clinical risks. We measured transcription accuracy of Whisper and WhisperX on clinical texts across native and non-native English speakers and found that both models have significantly higher errors for non-native speakers. Fortunately, we found that post-processing the transcripts using GPT-4o recovers the lost accuracy. Our findings indicate that using a chained model approach, WhisperX-GPT, will enhance transcription quality significantly and reduce errors associated with accented speech. We make all code, models, and pipelines freely available.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
15.2%
2
Physiological Measurement
12 papers in training set
Top 0.1%
12.8%
3
PLOS ONE
4510 papers in training set
Top 20%
9.4%
4
Frontiers in Digital Health
20 papers in training set
Top 0.1%
7.0%
5
PLOS Digital Health
91 papers in training set
Top 0.4%
5.0%
6
Scientific Reports
3102 papers in training set
Top 30%
4.1%
50% of probability mass above
7
JMIRx Med
31 papers in training set
Top 0.2%
3.2%
8
Journal of Medical Internet Research
85 papers in training set
Top 2%
3.0%
9
Journal of Personalized Medicine
28 papers in training set
Top 0.1%
2.7%
10
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.4%
11
Nature
575 papers in training set
Top 10%
1.8%
12
European Heart Journal - Digital Health
15 papers in training set
Top 0.3%
1.7%
13
iScience
1063 papers in training set
Top 17%
1.5%
14
JAMIA Open
37 papers in training set
Top 0.9%
1.5%
15
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.4%
1.4%
16
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.4%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
18
Sensors
39 papers in training set
Top 1%
1.3%
19
npj Digital Medicine
97 papers in training set
Top 3%
1.0%
20
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
21
BioData Mining
15 papers in training set
Top 0.6%
0.9%
22
JMIR Formative Research
32 papers in training set
Top 2%
0.8%
23
F1000Research
79 papers in training set
Top 4%
0.8%
24
Heliyon
146 papers in training set
Top 7%
0.7%
25
PLOS Biology
408 papers in training set
Top 20%
0.7%
26
Medical Decision Making
10 papers in training set
Top 0.3%
0.7%
27
Healthcare
16 papers in training set
Top 2%
0.7%
28
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 7%
0.5%
29
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.5%
30
Patterns
70 papers in training set
Top 3%
0.5%