Back

The richness of little voices: using artificial intelligence to understand early language development

Petrache, M.; Carvallo, A.; Silva, V.; Barcelo, P.; Pena, M.

2026-01-31 neuroscience
10.64898/2026.01.30.702650 bioRxiv
Show abstract

How informative are preschoolers speech vocalizations? Preschoolers speech is often imprecise, highly variable and hard to interpret by humans and machines; consequently, its predictive value for later developmental outcomes remains quite underexplored. Here, we analyzed 6.595 brief vocalizations (0.5-5s) from 127 preschoolers aged 3-4 years, including 74 children with diagnosed language delay, recorded in naturalistic environments. The vocalization models robustly distinguished children with and without language delay (ROC-AUC 0.90), beyond the acoustic properties of the recordings (ROC-AUC: 0.62), and outperformed similar models analyzing metadata that literature reports as predictive factor for early language development (ROC-AUC: < 0.69 [95% CI: 0.08 - 0.15 to 0.48 - 0.73], P < 0.001]). This indicates that neural networks applied to foundational model audio vectorizations can extract meaningful developmental markers from brief samples of immature speech, to classify speech status, offering a promising, scalable approach for language abilities early screening.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.7%
19.1%
2
Nature Human Behaviour
85 papers in training set
Top 0.3%
7.4%
3
PLOS ONE
4510 papers in training set
Top 24%
7.0%
4
PLOS Computational Biology
1633 papers in training set
Top 7%
5.0%
5
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 0.9%
4.3%
6
Journal of Neural Engineering
197 papers in training set
Top 0.6%
4.1%
7
eLife
5422 papers in training set
Top 24%
3.8%
50% of probability mass above
8
Computers in Biology and Medicine
120 papers in training set
Top 1%
3.0%
9
iScience
1063 papers in training set
Top 9%
2.4%
10
Advanced Science
249 papers in training set
Top 8%
2.4%
11
Communications Biology
886 papers in training set
Top 5%
2.1%
12
Frontiers in Neuroscience
223 papers in training set
Top 3%
2.1%
13
Journal of The Royal Society Interface
189 papers in training set
Top 2%
1.8%
14
NeuroImage
813 papers in training set
Top 4%
1.7%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 32%
1.7%
16
Trends in Hearing
12 papers in training set
Top 0.1%
1.3%
17
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.3%
18
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.1%
19
Nature Communications
4913 papers in training set
Top 58%
1.0%
20
Translational Psychiatry
219 papers in training set
Top 3%
1.0%
21
Heliyon
146 papers in training set
Top 4%
0.9%
22
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.9%
23
Sensors
39 papers in training set
Top 2%
0.8%
24
Imaging Neuroscience
242 papers in training set
Top 3%
0.8%
25
Frontiers in Computational Neuroscience
53 papers in training set
Top 2%
0.8%
26
eneuro
389 papers in training set
Top 8%
0.8%
27
Science Advances
1098 papers in training set
Top 27%
0.8%
28
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
29
Frontiers in Human Neuroscience
67 papers in training set
Top 2%
0.8%
30
Biomedical Signal Processing and Control
18 papers in training set
Top 0.5%
0.7%