Back

Linguistic and Acoustic Biomarkers from Simulated Speech Reveal Early Cognitive Impairment Patterns in Alzheimers Disease

Debnath, A.; Sarkar, S.

2026-04-08 neuroscience
10.64898/2026.04.08.717162 bioRxiv
Show abstract

BackgroundAlzheimers disease (AD) causes progressive decline in language and cognition. Automated speech analysis has emerged as a promising screening tool, yet clinical data scarcity limits progress. To address this, we generated a large-scale simulated speech dataset to model linguistic and acoustic deterioration across cognitive stages, Control, Mild Cognitive Impairment (MCI), and AD. MethodsUsing Monte Carlo simulations, we emulated the Pitt DementiaBank "Cookie Theft" narratives. Acoustic features (speech rate, pause duration, jitter, shimmer) and linguistic features (type-token ratio, unique-word count, filler usage) were synthetically sampled from real-world DementiaBank distributions. We trained an XGBoost classifier to distinguish diagnostic groups, and applied SHAP (Shapley Additive exPlanations) to assess feature importance. ResultsThe model achieved high discriminative performance (AUC {approx} 0.94; accuracy {approx} 85%). Compared to controls, simulated MCI and AD groups showed progressive declines in fluency and lexical diversity, and increases in disfluencies and voice instability. SHAP analysis revealed that key predictors included reduced type-token ratio, higher pause and filler rates, and elevated jitter/shimmer. Classification was most accurate for Control vs. AD; MCI misclassifications highlighted intermediate profiles. InterpretationOur framework, FMN (Forget Me Not), captures clinically relevant speech changes using simulated data, offering an explainable and scalable approach for cognitive screening. While not a substitute for real datasets, FMN validates a pipeline that mirrors known AD markers and can guide future real-world deployments. External validation remains a key next step for translational impact.

Matching journals

The top 14 journals account for 50% of the predicted probability mass.

1
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.1%
8.5%
2
Scientific Reports
3102 papers in training set
Top 17%
6.4%
3
Alzheimer's Research & Therapy
52 papers in training set
Top 0.4%
4.9%
4
Journal of Alzheimer’s Disease
39 papers in training set
Top 0.1%
4.9%
5
Alzheimer's & Dementia
143 papers in training set
Top 1%
4.9%
6
PLOS ONE
4510 papers in training set
Top 38%
3.6%
7
Translational Psychiatry
219 papers in training set
Top 2%
3.1%
8
eneuro
389 papers in training set
Top 4%
2.6%
9
BMC Bioinformatics
383 papers in training set
Top 4%
2.4%
10
Imaging Neuroscience
242 papers in training set
Top 2%
2.1%
11
Brain Communications
147 papers in training set
Top 1%
2.1%
12
Neurobiology of Language
28 papers in training set
Top 0.1%
2.1%
13
Journal of Alzheimer's Disease
43 papers in training set
Top 0.6%
2.1%
14
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.8%
50% of probability mass above
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 32%
1.7%
16
Brain
154 papers in training set
Top 3%
1.7%
17
Nature Human Behaviour
85 papers in training set
Top 2%
1.7%
18
Alzheimer's & Dementia: Translational Research & Clinical Interventions
16 papers in training set
Top 0.4%
1.7%
19
PLOS Computational Biology
1633 papers in training set
Top 17%
1.5%
20
eBioMedicine
130 papers in training set
Top 2%
1.5%
21
npj Digital Medicine
97 papers in training set
Top 2%
1.5%
22
Communications Biology
886 papers in training set
Top 11%
1.5%
23
Annals of Neurology
57 papers in training set
Top 1%
1.3%
24
Bioinformatics
1061 papers in training set
Top 8%
1.2%
25
GeroScience
97 papers in training set
Top 1%
1.2%
26
Nature Communications
4913 papers in training set
Top 56%
1.2%
27
Neurobiology of Disease
134 papers in training set
Top 3%
1.1%
28
Frontiers in Psychiatry
83 papers in training set
Top 3%
1.1%
29
Frontiers in Aging Neuroscience
67 papers in training set
Top 3%
1.0%
30
NeuroImage: Clinical
132 papers in training set
Top 3%
0.9%