Back

Psychiatric Voice Biomarkers: Methodological flaws in pediatric populations

Hamoudi, H. J. A. S.; Wu, M.-J.; Sanches, M.; Soutullo, C. A.; Olmos, C.; Taylor, L. K.; Zunta-Soares, G.; Soares, J. C.; Mwangi, B.

2025-10-15 psychiatry and clinical psychology

10.1101/2025.10.13.25337901 medRxiv

Show abstract

IntroductionPsychiatric assessments rely on patient self-reports, clinician observations, and standardized scales, while objective technological tools are currently not reliable enough to be utilized in a clinical setting. Voice may be utilized as a biomarker in different scenarios, including differential diagnosis, assessing symptom severity and predicting suicidality. However, its use depends on accurate automatic speech recognition (ASR). Current gold standard open source ASR systems are trained mainly on adult speech and perform poorly in children, limiting application in pediatric psychiatry. MethodsWe benchmarked two open-source ASR models--NVIDIA Parakeet and Whisper-small--on the Ohio Child Speech Corpus (303 children, ages 4-9), using the reference human transcripts provided with the dataset. Audio was standardized to each models expected sampling rate. No model fine-tuning or adaptation was performed. For each utterance, we computed word error rate (WER) and character error rate (CER), and assessed semantic fidelity using Sentence Movers Distance (SMD) and BERTScore F1. Metrics were summarized overall, stratified by single-year age bins (4, 5, 6, 7, 8, 9), and also grouped into two broader categories: younger children (ages 4-6) and older children (ages 7-9). We compared WER, CER, SMD, and BERTScore F1 across both age groups and evaluated age effects as trends using nonparametric statistical tests. ResultsBoth models showed significant age effects where younger children had markedly higher word error rates (WER >40%) and character error rates (CER >30%) compared to older children (WER [~]30%, CER [~]20%). Sentence mover distance improved with age, while BERTScore F1 remained stable. Despite age-related improvements, overall transcription accuracy was low. DiscussionCurrent commonly used open-source ASR systems are inadequate for pediatric audio transcription, specifically in younger children. In order to build clinically translatable tools, collecting child-specific data and model fine-tuning through structured speech paradigms is essential.

Psychiatric Voice Biomarkers: Methodological flaws in pediatric populations

Matching journals