Back

Detection of Type 2 Diabetes from 20-second Speech Recordings: A Large-Scale Validation Study

Brann, E.; Polle, R.; Cepukaityte, G.; Georgescu, A. L.; Parsons, O.; Molimpakis, E.; Goria, S.

2026-03-17 endocrinology
10.64898/2026.03.16.26348468 medRxiv
Show abstract

Accessible screening for type 2 diabetes (T2D) is critical, with millions of cases remaining undiagnosed globally. Here, we present the largest known real-world validation study for a speech-based T2D prediction model, trained on speech data from over 21,000 individuals, that works on features extracted from 20-second speech recordings. The model was evaluated in two stages: 1) Against self-reported diagnoses in 7,319 English-speaking participants using AUC, and 2) Against HbA1c blood tests in a subset of 801 participants drawn from the full cohort. Performance was also compared against QDiabetes and in the presence of key confounding variables. The model demonstrated clinically useful predictive capacity on self-reported data (AUC = 0.80 {+/-} 0.03), approaching QDiabetes (AUC = 0.86 {+/-} 0.03). It was robust to most demographic confounds (e.g., age and sex) and medication use, with reduced performance in the presence of comorbidities (e.g., cardiovascular disease and hypertension). At diabetes threshold of HbA1c [≥]48 mmol/mol, the model achieved an AUC of 0.75 ({+/-}0.07). This biomarker-validated speech-based tool demonstrates potential to complement existing methods through accessible, scalable screening requiring only a 20-second speech sample.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 2%
14.6%
2
eBioMedicine
130 papers in training set
Top 0.1%
9.3%
3
The Journal of Clinical Endocrinology & Metabolism
35 papers in training set
Top 0.2%
4.9%
4
Advanced Science
249 papers in training set
Top 3%
4.9%
5
eLife
5422 papers in training set
Top 24%
3.7%
6
PLOS ONE
4510 papers in training set
Top 38%
3.7%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.1%
8
Communications Medicine
85 papers in training set
Top 0.1%
3.1%
9
Diabetologia
36 papers in training set
Top 0.4%
2.9%
50% of probability mass above
10
Frontiers in Endocrinology
53 papers in training set
Top 0.7%
2.7%
11
Metabolites
50 papers in training set
Top 0.3%
2.1%
12
JMIR Public Health and Surveillance
45 papers in training set
Top 1%
2.1%
13
Expert Systems with Applications
11 papers in training set
Top 0.1%
1.9%
14
Cell Reports Medicine
140 papers in training set
Top 3%
1.8%
15
Genome Medicine
154 papers in training set
Top 4%
1.8%
16
Nature Communications
4913 papers in training set
Top 49%
1.8%
17
Nature Medicine
117 papers in training set
Top 2%
1.7%
18
Molecular Systems Biology
142 papers in training set
Top 0.6%
1.7%
19
European Respiratory Journal
54 papers in training set
Top 0.9%
1.7%
20
EMBO Molecular Medicine
85 papers in training set
Top 2%
1.7%
21
Diabetes
53 papers in training set
Top 0.5%
1.4%
22
PLOS Digital Health
91 papers in training set
Top 2%
1.2%
23
iScience
1063 papers in training set
Top 23%
1.1%
24
Schizophrenia
19 papers in training set
Top 0.3%
1.1%
25
Molecular Metabolism
105 papers in training set
Top 1%
1.0%
26
Biology
43 papers in training set
Top 2%
1.0%
27
Diabetes, Obesity and Metabolism
17 papers in training set
Top 0.4%
0.9%
28
Human Brain Mapping
295 papers in training set
Top 4%
0.9%
29
JAMIA Open
37 papers in training set
Top 1%
0.9%
30
Diabetes Care
12 papers in training set
Top 0.2%
0.9%