AI-based Speech Error Detection to Differentiate Primary Progressive Aphasia Variants
Vonk, J. M. J.; Lian, J.; Cho, C. J.; Antonicelli, G.; Ezzes, Z.; Wauters, L. D.; Keegan-Rodewald, W.; Kurteff, G. L.; Rodriguez, D. A.; Dronkers, N.; Henry, M. L.; Miller, Z. A.; Mandelli, M. L.; Anumanchipalli, G. K.; Gorno-Tempini, M. L.
Show abstract
BackgroundArtificial Intelligence (AI) based approaches to speech analysis have the potential to assist with objective speech error analysis in aphasia but off-the shelf tools often fail to detect speech errors due to prioritizing "fluent transcription." Speech production errors (dysfluencies) are hallmark diagnostic features of the nonfluent (nfvPPA) and logopenic (lvPPA) variants of primary progressive aphasia, yet they can be challenging to detect and characterize even by expert clinicians. This study aimed to evaluate whether the novel automated lightweight Scalable Speech Dysfluency Modeling system (SSDM-L), specifically designed to detect dysfluencies, could accurately distinguish PPA variants using voice recordings of individuals reading a brief passage. MethodParticipants included a total of 104 individuals, 40 with nfvPPA, 40 with lvPPA (matched on disease severity), and 24 healthy controls who read aloud the Grandfather Passage as part of a widely used motor speech evaluation (MSE). We automatically extracted ten speech error (dysfluency) variables using SSDM-L, including insertions, replacements, and deletions at both phoneme- and word-levels, and phoneme-level prolongations and repetitions. Group differences were assessed via ANCOVAs controlling for age, education, and disease severity (MMSE, CDR sum-of-boxes). To test clinical relevance, we performed correlation analyses with MSE ratings provided by experienced speech-language pathologists (i.e., gold standard) within the nfvPPA group. Classification performance was assessed by training random forest and XGBoost machine-learning models including 5-fold cross-validation. ResultsAll individuals read the entire passage in less than five minutes. SSDM-L detected eight of the ten predefined dysfluency features at sufficient frequency to include them in subsequent analyses. All eight features distinguished PPA from controls (p<.006). Individuals with nfvPPA made more errors than the lvPPA group on every feature (all p<.023). Each feature showed a moderate positive correlation with a global MSE apraxia/dysarthria score (r=.31-.56; p<.001-.053). Together, the eight features were able to classify nfvPPA versus lvPPA at AUC=.806 (random forest) and AUC=.776 (XGBoost). DiscussionAI-based automated speech error analysis accurately distinguished nfvPPA and lvPPA variants using a brief reading task. This quick error-sensitive scalable AI system has the potential of providing a practical tool to aid diagnosis in aphasia and motor speech disorders.
Matching journals
The top 11 journals account for 50% of the predicted probability mass.