Disentangling Symptom Heterogeneity in Large-Scale Psychiatric Text: Domain-Adapted vs. Instruction-Tuned Transformers
Varone, G.; Kumar, P.; Brown, J.; Boulila, W.
Show abstract
Psychiatric disorders are fundamentally challenged by symptom heterogeneity, high comorbidity, and the absence of objective biomarkers, which together result in substantial variability in clinical assessment and treatment selection. Patient-generated language captures rich information about subjective experience and symptom severity, which can be systematically encoded and analyzed using computational models, making it a scalable signal for psychiatric assessment. We compare two approaches: (i) a domain-specialized transformer fine-tuned on clinical language, based on the Bio-ClinicalBERT encoder architecture, and (ii) a large-scale instruction-tuned generalist encoder (Instructor-XL) used as a frozen feature extractor with a shallow classification head. A corpus of N = 151,228 de-identified texts was compiled from five public sources, covering four psychiatric phenotypes: anxiety, depression, schizophrenia, and suicidal intention. Models were evaluated using stratified 10-fold cross-validation with cost-sensitive training, prioritizing imbalance-aware metrics, including Macro-F1 and Matthews Correlation Coefficient (MCC), over accuracy. Bio-ClinicalBERT achieved superior overall performance (Macro-F1 = 0.78, MCC = 0.6752), indicating more reliable separation of diagnostically overlapping affective categories. In contrast, Instructor-XL achieved its highest class-specific performance for schizophrenia (F1 = 0.798). Explainability analyses suggest that the domain-specialized model places greater weight on clinically relevant terms, whereas the generalist model relies on a broader set of lexical features.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.