Predicting Post-Stroke Aphasia Speech Performance from Multimodal Data with Explainable Machine Learning
Parchure, S.; Gupta, A.; Kelkar, A.; Vnenchak, L.; Faseyitan, O.; Medaglia, J. D.; Harvey, D. Y.; Coslett, H. B.; Hamilton, R. H.
Show abstract
Aphasia, an acquired language deficit, is the most common post-stroke focal cognitive impairment, and roughly 60% cases become chronic (duration >6 months). Aphasia therapies could be optimized if clinicians could make personalized predictions of how individual persons with aphasia (PWA) would be likely to perform on particular language tasks. However, current approaches relying on imaging, lesion volume, patient demographics, and clinical scores achieve less than 50% accuracy in predicting performance in PWA. Research algorithms using complex imaging and fMRI can make binary predictions about the presence or absence of aphasia but do not give more clinically relevant information. We aim to predict word-by-word speech accuracy in PWA to better enable personalized speech therapies. To be clinically informative, machine learning models developed for this purpose should use clinically available inputs, explain key features behind a prediction, and generalize to new PWA and previously unseen words. This study combines multimodal input features from clinical testing scores and structural MRI neuroimaging with a novel data source: word-by-word linguistic difficulty. We computed metrics of cognitive burden, such as semantic selection and recall demands, and articulatory burden, such as word length in phonemes and syllables, using naturalistic corpora containing over a billion words of English text. Retrospective training, ten-fold cross validation and 500-run bootstrapping of different machine learning models with various combinations of input features was conducted using 4620 trials. A simplified version of the best model using widely available inputs was deployed clinically through a web app, and prospective generalization was tested on 570 trials with unseen words and different naming tasks in new PWA. We found the best performances with random forest classifiers using linguistic difficulty combined with either clinical information (AUROC {+/-} SEM = 0.87 {+/-} 0.07), or all together with structural imaging connectivity (0.90 {+/-} 0.04). Classifiers using multimodal inputs significantly outperformed others employing single inputs (range 0.66-0.85, p<0.05). Extracting feature importances from the best model showed that Western Aphasia Battery scores, semantic demands, number of phonemes, and syllables were predictive of PWA speech accuracy. Structural integrity in peri-lesional brain regions predicted better language performance whereas higher connectivity of select contralateral homotopes contributed to prediction of worse speech. Without the inclusion of MRI data, lesion volume was a key predictor of PWA speech as well. A simplified, clinically ready, explainable model (publicly available as AphasiaLENS web application) predicted PWA accuracy for any user-entered word, not restricted to a standardized battery. Its prospective generalization performance was not significantly different from the best model using full inputs (AUROC ranges 0.81-0.89, p>0.05). Thus, our research can help inform individualized treatment planning for PWA, while also suggesting research targets through better understanding of brain-behavior relationships.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.