Back

Incorporating phenotype heterogeneity in disease GWAS improves power while maintaining specificity

Hof, J. J. P.; Ning, C.; Quinn, L.; Speed, D.

2026-03-27 genetic and genomic medicine
10.64898/2026.03.26.26349370 medRxiv
Show abstract

Common complex diseases are clinically heterogeneous, yet most genome-wide association studies (GWAS) assume cases are genetically homogeneous. This challenge is compounded in large-scale biobanks, which increasingly combine cases ascertained under different recruitment strategies, raising concerns that heterogeneous case definitions may dilute genetic signal. To address this, we developed StratGWAS, a scalable framework that leverages clinical features of heterogeneity to construct a transformed phenotype that better reflects genetic liability within diseases. StratGWAS stratifies cases using secondary phenotypic information such as age of onset, medication burden, or recruitment definition. StratGWAS then estimates genetic covariance between strata, and derives a transformed phenotype that upweights cases with higher inferred genetic liability. Through simulation studies (N = 100k) and application to the UK Biobank (N = 368k), we show that StratGWAS consistently outperformed standard GWAS methods. Applied to 21 UK Biobank traits, StratGWAS upweighted individuals with earlier disease onset and higher medication burden, yielding respectively 17% and 4% more independent genome-wide significant loci than standard case control GWAS. Applied to depression, StratGWAS upweighted individuals with multiple diagnoses, greater psychiatric comorbidity, or higher self reported depressive symptoms, identifying eight additional independent loci compared to case-control GWAS.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.