Back

Automated detection of adult autism from vowel acoustics using machine learning

Georgiou, G. P.; Paphiti, M.

2026-04-04 health informatics
10.64898/2026.04.03.26350102 medRxiv
Show abstract

Autism spectrum disorder (ASD) is a neurodevelopmental condition for which timely and accurate detection remains a major clinical priority. Early and reliable identification is important because it can facilitate access to assessment, diagnosis, and appropriate support; however, current diagnostic pathways still rely largely on behavioural evaluation and clinical judgement. In this context, machine-learning (ML) approaches have attracted growing interest because they can identify subtle and complex patterns in speech data that may not be easily captured through conventional methods. The current study capitalizes on this potential by developing and evaluating ML models for distinguishing autistic individuals from neurotypical individuals based on speech features. More specifically, acoustic features of vowels, including fundamental frequency (F0), first three formants (F1, F2, F3), duration, jitter, shimmer, harmonics-to-noise ratio (HNR), and intensity, were elicited from 18 autistic adults and 18 neurotypical adults through a controlled production task. Then, four supervised ML models were trained and evaluated on these features: LightGBM, Random Forest, Support Vector Machine, and XGBoost. All models demonstrated good classification performance, with the best-performing model achieving a strong discriminability of 89%. The explainability analysis identified F0 as the most influential predictor by a substantial margin, followed by intensity, F3, and F1, while duration, shimmer, HNR, jitter, and F2 contributed more modestly. These findings demonstrate that vowel acoustics contain clinically relevant information for distinguishing autistic from neurotypical adult speech and highlight the potential of interpretable, speech-based ML as a transparent and scalable aid for ASD screening and assessment.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.4%
22.5%
2
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.1%
9.1%
3
NeuroImage: Clinical
132 papers in training set
Top 0.7%
6.3%
4
Computers in Biology and Medicine
120 papers in training set
Top 0.3%
6.3%
5
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 0.7%
4.9%
6
PLOS ONE
4510 papers in training set
Top 32%
4.9%
50% of probability mass above
7
Frontiers in Digital Health
20 papers in training set
Top 0.2%
3.6%
8
Sensors
39 papers in training set
Top 0.5%
3.6%
9
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.1%
10
NeuroImage
813 papers in training set
Top 3%
2.1%
11
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
12
Cognitive Neurodynamics
15 papers in training set
Top 0.2%
1.5%
13
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
14
Frontiers in Psychiatry
83 papers in training set
Top 2%
1.2%
15
Physiological Measurement
12 papers in training set
Top 0.3%
1.1%
16
Journal of Personalized Medicine
28 papers in training set
Top 0.9%
0.9%
17
Communications Biology
886 papers in training set
Top 21%
0.8%
18
Translational Psychiatry
219 papers in training set
Top 4%
0.8%
19
iScience
1063 papers in training set
Top 29%
0.8%
20
Biomedical Signal Processing and Control
18 papers in training set
Top 0.4%
0.8%
21
Heliyon
146 papers in training set
Top 7%
0.7%
22
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
23
Autism Research
32 papers in training set
Top 0.4%
0.7%
24
Human Brain Mapping
295 papers in training set
Top 4%
0.7%
25
Artificial Intelligence in Medicine
15 papers in training set
Top 0.8%
0.6%
26
Biology Methods and Protocols
53 papers in training set
Top 3%
0.6%