Back

Deep learning-based stratification of Schizophrenia Spectrum Disorder from real-world data reveals distinct profiles of common and rare variant genetic signal

Cobuccio, L.; Pielies Avelli, M.; Webel, H.; Hernandez Medina, R.; Vaez, M.; Georgii Hellberg, K.-L.; Hsu, Y.-H. H.; Pintacuda, G.; iPSYCH Study Consortium, ; Rosengren, A.; Werge, T.; Lage, K.; Rasmussen, S.

2026-04-04 psychiatry and clinical psychology
10.64898/2026.03.30.26349393 medRxiv
Show abstract

Schizophrenia spectrum disorder (SSD) is a clinically and genetically heterogeneous condition, yet few studies have integrated real-world clinical data with both common and rare genetic variation to explore this complexity. In this study, we analyzed real-world data from 22,092 individuals in the Danish iPSYCH cohort (11,046 SSD cases and 11,046 matched population controls) leveraging nationwide registry data on diagnoses, hospitalizations, and parental history. Using a variational autoencoder (VAE), we compressed these features into a latent space and identified ten clinically distinct SSD subgroups that varied in comorbidity, parental diagnoses, hospital burden, and early-life adversity. Polygenic scores (PGSs) for five psychiatric disorders showed subgroup-specific enrichment, highlighting potential links between complex clinical profiles and common variant liability. In a subset with exome data (N=5,969), we assessed rare deleterious variant burden across SCZ-informed gene sets and Protein-Protein Interaction (PPI) networks, observing suggestive network-specific trends. This framework for integrating real world-based stratification with genetic evidence is scalable and transferable across cohorts, offering a path toward biologically informed patient classification.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 4%
21.9%
2
Biological Psychiatry
119 papers in training set
Top 0.4%
8.2%
3
Schizophrenia Bulletin
29 papers in training set
Top 0.2%
6.6%
4
Nature Neuroscience
216 papers in training set
Top 1%
6.6%
5
Nature Genetics
240 papers in training set
Top 1%
6.6%
50% of probability mass above
6
Nature Medicine
117 papers in training set
Top 0.4%
6.2%
7
Genome Medicine
154 papers in training set
Top 2%
4.1%
8
JAMA Psychiatry
13 papers in training set
Top 0.1%
3.5%
9
Molecular Psychiatry
242 papers in training set
Top 1.0%
3.5%
10
Nature
575 papers in training set
Top 8%
2.7%
11
Translational Psychiatry
219 papers in training set
Top 2%
1.8%
12
Schizophrenia
19 papers in training set
Top 0.2%
1.8%
13
Scientific Reports
3102 papers in training set
Top 61%
1.6%
14
npj Digital Medicine
97 papers in training set
Top 2%
1.6%
15
NeuroImage: Clinical
132 papers in training set
Top 3%
1.3%
16
Science Advances
1098 papers in training set
Top 24%
1.2%
17
Nature Mental Health
18 papers in training set
Top 0.2%
1.1%
18
eLife
5422 papers in training set
Top 52%
0.9%
19
Human Brain Mapping
295 papers in training set
Top 4%
0.9%
20
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
22
Neuropsychopharmacology
134 papers in training set
Top 3%
0.7%
23
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
24
Neuron
282 papers in training set
Top 9%
0.7%
25
Bioinformatics
1061 papers in training set
Top 10%
0.7%
26
Nature Human Behaviour
85 papers in training set
Top 5%
0.7%
27
iScience
1063 papers in training set
Top 38%
0.6%
28
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.6%
29
Communications Medicine
85 papers in training set
Top 2%
0.6%
30
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.6%