Back

Predicting PANSS symptoms in schizophrenia spectrum disorders using speech only: an international, multi-centre, retrospective, computational study across multiple languages

He, R.; Kirdun, M.; Palominos, C.; Navarrete Orejudo, L.; Barthelemy, S.; Bhola, S.; Ciampelli, S.; Decker, A.; Demirlek, C.; Fusaroli, R.; Garcia-Molina, J. T.; Gimenez, G.; Huppi, R.; Koelkebeck, K.; Lecomte, A.; Qiu, R.; Simonsen, A.; Tourneur, V.; Verim, B.; Wang, H.; Yalincetin, B.; Yin, S.; Zhou, Y.; Amblard, M.; Ayesa Arriola, R.; Bora, E.; de Boer, J.; Figueroa-Barra, A. I.; Koops, S.; Musiol, M.; Palaniyappan, L.; Parola, A.; Spaniel, F.; Tang, S. X.; Sommer, I. E.; Homan, P.; Hinzen, W.

2026-02-28 psychiatry and clinical psychology

10.64898/2026.02.20.26345632 medRxiv

Show abstract

Backgroundspeech carries cues to variation in mental state in schizophrenia spectrum disorders/psychotic disorders, typically indexed with clinician-rated scales such as the PANSS. Progress in the automation of speech-based symptom modelling has been constrained by data scale and the underrepresentation of low-resource languages. In this study, we aggregate multi-center recordings to assemble a large corpus and assess symptom-prediction models at scale, to enable more objective and efficient assessments and the early detection of relapse-related signals from speech. MethodsWe compiled data from 453 patients with schizophrenia spectrum disorders, recruited from ten global sites, and clipped their speech recordings into 6,664 segments. Across three feature sets, acoustic-prosodic profile, pretrained multilingual embeddings, and their concatenation, we compared 16 algorithms to predict eight relapse-related PANSS items, including three positive (P1, P2, P3), three negative (N1, N4, N6), and two general (G5, G9) items, on speaker-disjoint splits (80% train, 10% test, and 10% validation). Performance was assessed by root-mean-squared-error (RMSE) at both segment and participant (median aggregation) levels. Best model per item underwent bias checks for age, sex, education, and symptom severity. OutcomesBest-performing models predicted symptoms with prediction errors of 1{middle dot}5 PANSS points or lower: P1 1{middle dot}494/1{middle dot}527, P2 1{middle dot}318/1{middle dot}107, P3 1{middle dot}407/1{middle dot}542, N1 1{middle dot}029/1{middle dot}030, N4 1{middle dot}452/1{middle dot}430, N6 0{middle dot}860/0{middle dot}855, G5 0{middle dot}850/0{middle dot}882, G9 1{middle dot}213/1{middle dot}282 (segment/participant). Performance of the pretrained multilingual embeddings surpassed acoustic-prosodic features and their concatenation. Results were comparable in low-resource languages (e.g., Czech). We found no bias by age, sex, or education, aside from reduced N4 accuracy in males; but performance degraded with higher symptom severity. InterpretationSpeech can support automatic assessment of schizophrenia symptoms using pretrained multilingual embeddings, even without the use of transcripts. Such models show promise as clinically meaningful, efficient, and low-burden tools for real-time monitoring of symptom trajectories. FundingEU Horizon research and innovation programme. Research in contextO_ST_ABSEvidence before this studyC_ST_ABSAutomatic assessment of disease severity is a key issue in schizophrenia research, for which spontaneous speech offers a cost-effective, automatable solution. To evaluate existing evidence for speech-based symptom assessment, two reviewers (RHe, MK) searched PubMed, IEEE Xplore, arXiv, bioRxiv, and medRxiv for publications from inception to Aug 25, 2025, using the terms: ("symptom" OR "PANSS" OR "Positive and Negative Syndrome Scale") AND ("psychosis" OR "schizophrenia") AND ("language" OR "speech" OR "spontaneous speech") AND ("prediction" OR "machine learning" OR "deep learning" OR "algorithm" OR "neural network" OR "AI" OR "artificial intelligence"). Fourteen studies on symptom-level modelling were identified. Ten studies dichotomized clinical scores (e.g., PANSS) into low vs high for classification: five used conventional ML (e.g., random forests) and five used neural networks, with F1 scores ranging from 0{middle dot}60-0{middle dot}85. The remaining four studies, and two of the ten studies as mentioned above, modelled raw scores directly as regression tasks. Two relied solely on conventional regressors and the rest used neural networks, with errors from 0{middle dot}487 for single items (scale 1-7) to 8{middle dot}04 for summed scores (scale 18-126). All studies used free speech for elicitation, except one study, which used a reading task. Three studies incorporated additional tasks, such as picture description and immediate recall. None were multilingual: nine were in English, three in Chinese, one in Swiss German, and one in Brazilian Portuguese. Features spanned a wide range, including acoustic-prosodic profiles, morpho-syntactic structure, semantic organization, pragmatics (including sentiments), and even visual features capturing movement during talking. Representations from pretrained language models were also widely employed. Sample sizes (counting patients with schizophrenia) were generally small: eleven studies enrolled <50 patients, one had 65, and only two exceeded 100 patients. Some increased their effective sample size via multiple recordings per patient or by adding healthy controls and/or patients with other psychiatric disorders (e.g., depression). Added value of this studyTo our knowledge, this is the first multilingual, speech-based study modelling schizophrenia symptom severity with machine learning approach, and it includes the largest cohort of patients with schizophrenia to date. We further increased effective sample size by using diverse elicitation tasks and segmenting recordings into clips. This multilingual corpus empowers the usage of complex models and supports transfer learning from high-resource languages (e.g., English) to low-resource ones (e.g., Czech). For each of eight selected relapse-related PANSS items, the best audio-only models achieved RMSE < 1{middle dot}5, underscoring clinical relevance. We assessed potential biases: no effects were found for age, sex, or education (except poorer N4 performance in males), though performance declined at higher symptom severity. Trained models are released for use. Implications of all the available evidenceWe show that speech is a powerful signal for automatic assessment of schizophrenia symptom severity and holds promise for relapse prediction, even without transcripts. The approach readily extends to incorporate textual features (from manual or automatic transcripts) and more advanced models. Prospective studies with repeated recordings across relapse episodes are needed to validate the utility of our models on relapse prediction, for the sake of supporting precision psychiatry while reducing clinician burden.

Predicting PANSS symptoms in schizophrenia spectrum disorders using speech only: an international, multi-centre, retrospective, computational study across multiple languages

Matching journals