Spoken language biomarkers in Turkish-speaking schizophrenia patients: Evidence from linguistic analysis and word embeddings
Cınar Bozdag, M.; Kumcu, A.; Senel, L. K.; Temizkan, H. N.; Özil, O.; Arslanyürek, I.; Ertekin, P. N.; Candansayar, S.
Show abstract
Background and HypothesisSchizophrenia (SZ) is considered a "thought disorder". Therefore, language assessment is crucial in diagnosing SZ. Linguistic analysis and emerging computational language models provide objective biomarkers for diagnosis. Against this background, the main hypothesis is that the language patterns of SZ patients are significantly different from those of healthy controls (HCs) in Turkish, as has previously been shown in other languages. MethodsSpeech characteristics of 50 native Turkish-speaking SZ patients were compared with 50 HCs matched for age, sex, length of education, and right/left-handedness. Speech data were collected in 15-minute interviews. The interview recordings were transcribed and analysed for various lexical, syntactic and phonological measures in CLAN and compared for discourse measures using fastText word embedding models. ResultsThe number of words produced per minute, the number of different words, mean length of utterance, average word frequency, the number of filled pauses, discourse coherence and question-response similarity were lower in the patient group than in the control group. On the other hand, content words/function words ratio, sentence prediction loss, different words/total words ratio, the number of silent pauses, and silent pauses/total speech ratio were higher in the patient group than in the control group. ConclusionThe hypothesis is confirmed. The results from Turkish-speaking SZ patients show similarities with results from other languages from other language families. The findings are important as Turkish is a low-resource and relatively under-researched language in the literature. The manuscript is under peer review. Please do not cite this preprint.