Back

Reproducible symptom subtypes of depression identified using unsupervised machine learning

Howard, D. M.; Rabelo-da-Ponte, F. D.; Viejo-Romero, M.; Vassos, E.; Lewis, C. M.

2026-02-16 psychiatry and clinical psychology
10.64898/2026.02.13.26346271 medRxiv
Show abstract

Depression is a heterogeneous disorder, often diagnosed based on symptom co-occurrence. However, individuals may present with markedly different symptom profiles, potentially reflecting distinct underlying mechanisms. Identifying common patterns of symptoms using data-driven approaches could help clarify the heterogeneity of depression. Furthermore, examining the sociodemographic and lifestyle characteristics, health status, and polygenic scores of individuals with specific symptom profiles may offer insights into underlying risk factors. Unsupervised machine learning models were applied to large-scale data from the UK Biobank. Independent groups of individuals were assessed at two time points (the Mental Health Questionnaire: Q1; and the Mental Well-being Questionnaire: Q2) and reporting on historical or current episodes of depression. Two machine learning models, multivariate Bernoulli-mixtures and agglomerative hierarchical clustering, were used to identify common sets of symptoms and cluster individuals by symptom similarity. Consistency of results was examined between Q1 and Q2 and between clustering models. Associations between cluster membership probabilities and sociodemographic and lifestyle factors (sex, age, body mass index, smoking status, ethnicity, and deprivation), eight health conditions, and polygenic scores for bipolar disorder, schizophrenia, and attention-deficit/hyperactivity disorder (ADHD) were examined using regression models. Symptom clusters were highly consistent across Q1 and Q2 (mean correlation > 0.81) and between machine learning models (Rand Index > 0.83). Clusters aligned with the existing clinical subtypes, atypical and melancholic depression, alongside other potentially novel clusters reflecting a range of different symptom profiles. Atypical clusters (hypersomnia with weight gain) appeared in both Q1 and Q2 and were associated with younger age and higher body mass index. Distinct clusters combining insomnia, weight gain, and having thoughts of death were associated with asthma, suggesting potential inflammatory dysregulation. Further clusters were characterised by psychomotor changes and showed strong associations with Parkinsons disease, both before and after the mental health questionnaire was conducted. These findings highlight robust and clinically meaningful symptom subtypes within depression and support the use of data-driven approaches to improve diagnostic refinement and inform personalised treatment strategies.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Translational Psychiatry
219 papers in training set
Top 0.1%
27.5%
2
Journal of Affective Disorders
81 papers in training set
Top 0.2%
12.3%
3
BJPsych Open
25 papers in training set
Top 0.1%
4.8%
4
Psychological Medicine
74 papers in training set
Top 0.4%
4.3%
5
Molecular Psychiatry
242 papers in training set
Top 0.7%
4.3%
50% of probability mass above
6
Biological Psychiatry
119 papers in training set
Top 1.0%
3.6%
7
American Journal of Medical Genetics Part B: Neuropsychiatric Genetics
22 papers in training set
Top 0.1%
3.2%
8
Acta Neuropsychiatrica
12 papers in training set
Top 0.2%
3.2%
9
Biological Psychiatry Global Open Science
54 papers in training set
Top 0.3%
3.2%
10
Frontiers in Psychiatry
83 papers in training set
Top 1%
2.6%
11
Psychiatry Research
35 papers in training set
Top 0.6%
2.6%
12
European Psychiatry
10 papers in training set
Top 0.2%
2.3%
13
BMC Medicine
163 papers in training set
Top 3%
1.9%
14
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
1.9%
15
The British Journal of Psychiatry
21 papers in training set
Top 0.5%
1.7%
16
PLOS ONE
4510 papers in training set
Top 62%
1.1%
17
BMJ Mental Health
15 papers in training set
Top 0.3%
0.9%
18
European Neuropsychopharmacology
15 papers in training set
Top 0.5%
0.9%
19
Scientific Reports
3102 papers in training set
Top 71%
0.9%
20
American Journal of Psychiatry
20 papers in training set
Top 0.4%
0.9%
21
Journal of Psychiatric Research
28 papers in training set
Top 0.6%
0.9%
22
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 1%
0.8%
23
Nature Communications
4913 papers in training set
Top 63%
0.7%
24
Neuropsychopharmacology
134 papers in training set
Top 3%
0.7%
25
Journal of Neurology, Neurosurgery & Psychiatry
29 papers in training set
Top 1%
0.7%
26
Disease Models & Mechanisms
119 papers in training set
Top 3%
0.6%
27
npj Digital Medicine
97 papers in training set
Top 4%
0.6%