Disentangling Symptom Heterogeneity in Large-Scale Psychiatric Text: Domain-Adapted vs. Instruction-Tuned Transformers

Varone, G.; Kumar, P.; Brown, J.; Boulila, W.

2026-02-26 psychiatry and clinical psychology

10.64898/2026.02.24.26347006 medRxiv

Show abstract

Psychiatric disorders are fundamentally challenged by symptom heterogeneity, high comorbidity, and the absence of objective biomarkers, which together result in substantial variability in clinical assessment and treatment selection. Patient-generated language captures rich information about subjective experience and symptom severity, which can be systematically encoded and analyzed using computational models, making it a scalable signal for psychiatric assessment. We compare two approaches: (i) a domain-specialized transformer fine-tuned on clinical language, based on the Bio-ClinicalBERT encoder architecture, and (ii) a large-scale instruction-tuned generalist encoder (Instructor-XL) used as a frozen feature extractor with a shallow classification head. A corpus of N = 151,228 de-identified texts was compiled from five public sources, covering four psychiatric phenotypes: anxiety, depression, schizophrenia, and suicidal intention. Models were evaluated using stratified 10-fold cross-validation with cost-sensitive training, prioritizing imbalance-aware metrics, including Macro-F1 and Matthews Correlation Coefficient (MCC), over accuracy. Bio-ClinicalBERT achieved superior overall performance (Macro-F1 = 0.78, MCC = 0.6752), indicating more reliable separation of diagnostically overlapping affective categories. In contrast, Instructor-XL achieved its highest class-specific performance for schizophrenia (F1 = 0.798). Explainability analyses suggest that the domain-specialized model places greater weight on clinically relevant terms, whereas the generalist model relies on a broader set of lexical features.

Disentangling Symptom Heterogeneity in Large-Scale Psychiatric Text: Domain-Adapted vs. Instruction-Tuned Transformers

Matching journals