Back

Automated Macrolinguistic Discourse Analysis for Transdiagnostic Detection of Language Impairments

Lee, S. H.; Wang, S.; Varkanitsa, M.; Kiran, S.

2026-05-21 neurology
10.64898/2026.05.19.26353614 medRxiv
Show abstract

Macrolinguistic discourse analysis offers valuable insight into how patients with neurogenic communication disorders organize and produce informative speech, yet it remains a largely manual and labor-intensive process. We report an automated pipeline for macrolinguistic discourse analysis for individuals with aphasia and dementia that integrates automatic speech recognition (ASR), utterance segmentation, sentence-level embeddings, centroid-based main-concept matching, and rule-based coherence error classification. These algorithms were applied to Cinderella story retellings from 309 participants (113 controls, 102 post-stroke aphasia (PWA), and 94 dementia). The algorithm reliably identified main concepts (83% accuracy against human labels) and derived interpretable features such as semantic distance to a main concept centroid, main concept coverage, and coherence error rates. Crucially, diagnostic classification results showed that logistic-regression classifiers trained on 10 macrolinguistic features distinguished aphasia from controls with high accuracy (AUC {approx} 0.94) but showed weaker separation for dementia (controls vs dementia AUC {approx} 0.66; aphasia vs dementia AUC {approx} 0.58). Semantic distance to the centroid emerged as a robust, informative predictor for diagnostic classification, demonstrating that the ability to produce narrative-aligned speech is clinically important. The automated pipeline enables scalable macrolinguistic discourse analysis that could support screening and longitudinal monitoring of discourse impairments across neurogenic populations.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.5%
19.8%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 7%
8.6%
3
Nature Medicine
117 papers in training set
Top 0.4%
4.9%
4
Nature Communications
4913 papers in training set
Top 32%
4.9%
5
Med
38 papers in training set
Top 0.1%
4.4%
6
Nature Computational Science
50 papers in training set
Top 0.1%
4.0%
7
Nucleic Acids Research
1128 papers in training set
Top 5%
3.7%
50% of probability mass above
8
PLOS ONE
4510 papers in training set
Top 41%
3.1%
9
Scientific Data
174 papers in training set
Top 0.8%
2.1%
10
Brain
154 papers in training set
Top 2%
2.1%
11
Advanced Science
249 papers in training set
Top 8%
2.1%
12
Communications Biology
886 papers in training set
Top 6%
1.9%
13
Brain Communications
147 papers in training set
Top 1%
1.9%
14
Frontiers in Digital Health
20 papers in training set
Top 0.6%
1.8%
15
Journal of Neural Engineering
197 papers in training set
Top 1%
1.7%
16
Journal of Speech, Language, and Hearing Research
10 papers in training set
Top 0.1%
1.7%
17
Nature
575 papers in training set
Top 12%
1.5%
18
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.5%
19
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.0%
20
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.9%
21
npj Digital Medicine
97 papers in training set
Top 3%
0.9%
22
eBioMedicine
130 papers in training set
Top 4%
0.8%
23
NeuroImage: Clinical
132 papers in training set
Top 3%
0.8%
24
IEEE Transactions on Biomedical Engineering
38 papers in training set
Top 0.8%
0.8%
25
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
26
PLOS Digital Health
91 papers in training set
Top 3%
0.8%
27
Frontiers in Neuroscience
223 papers in training set
Top 8%
0.7%
28
Journal of NeuroEngineering and Rehabilitation
28 papers in training set
Top 1.0%
0.7%
29
Nature Neuroscience
216 papers in training set
Top 7%
0.7%
30
Artificial Intelligence in Medicine
15 papers in training set
Top 0.9%
0.5%