Detection of Novel Acoustic Biomarkers for Parkinson's Disease through a Machine Learning-Based Composite Spectrogram Analysis
Tsutsumi, K.; Chang, P. D.; Isfahani, S. A.
Show abstract
BackgroundSpeech abnormalities are common in Parkinsons disease (PD). Machine learning (ML) offers potential for objective and scalable speech-based diagnostics. This study introduces an explainable ML pipeline that leverages a novel vowel articulation-based composite input to detect PD and identify phoneme-level biomarkers. MethodsTwo publicly available datasets of PD speech recordings were analyzed. Sustained vowel articulations were converted into log-mel spectrograms either individually or as a composite image by vertically concatenating a set of vowels per subject. Processed spectrograms were used to train ML models, with performance assessed using five-fold cross-validation and bootstrapped area under the receiver operating characteristics curve (AUROC). Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to quantify model attention across vowel regions. ResultsA total of 150 patients (49.3% PD) were included. Acoustic analysis revealed significant group differences in cepstral peak prominence and harmonics-to-noise ratio, particularly for vowel /u/ (p < 0.05). The ML model achieved an average AUROC of 0.805 using individual vowels and improved to 0.928 with the composite input (p < 0.001). Grad-CAM demonstrated the highest activation for vowel /u/ (p < 0.001), consistent with acoustic findings. ConclusionThe proposed explainable composite spectrogram approach dually enabled high classification performance and identification of a vowel biomarker. Concordance between ML and acoustic analyses highlights the translational potential of explainable ML in PD speech assessment and its ability to reveal underlying pathophysiological insights.