Deep learning-based stratification of Schizophrenia Spectrum Disorder from real-world data reveals distinct profiles of common and rare variant genetic signal
Cobuccio, L.; Pielies Avelli, M.; Webel, H.; Hernandez Medina, R.; Vaez, M.; Georgii Hellberg, K.-L.; Hsu, Y.-H. H.; Pintacuda, G.; iPSYCH Study Consortium, ; Rosengren, A.; Werge, T.; Lage, K.; Rasmussen, S.
Show abstract
Schizophrenia spectrum disorder (SSD) is a clinically and genetically heterogeneous condition, yet few studies have integrated real-world clinical data with both common and rare genetic variation to explore this complexity. In this study, we analyzed real-world data from 22,092 individuals in the Danish iPSYCH cohort (11,046 SSD cases and 11,046 matched population controls) leveraging nationwide registry data on diagnoses, hospitalizations, and parental history. Using a variational autoencoder (VAE), we compressed these features into a latent space and identified ten clinically distinct SSD subgroups that varied in comorbidity, parental diagnoses, hospital burden, and early-life adversity. Polygenic scores (PGSs) for five psychiatric disorders showed subgroup-specific enrichment, highlighting potential links between complex clinical profiles and common variant liability. In a subset with exome data (N=5,969), we assessed rare deleterious variant burden across SCZ-informed gene sets and Protein-Protein Interaction (PPI) networks, observing suggestive network-specific trends. This framework for integrating real world-based stratification with genetic evidence is scalable and transferable across cohorts, offering a path toward biologically informed patient classification.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.