A biobank-scale method for learning modulators of gene-environment interaction underlying human complex traits from multiple environmental exposures
Liu, Z.; Ramteke, A.; Anand, A.; Gorla, A.; Jeong, M.; Sankararaman, S.
Show abstract
It is increasingly recognized that genetic effects on complex traits and diseases are shaped by environmental context. Biobanks that measure diverse environmental exposures alongside genotypes and phenotypes at scale enable systematic study of gene-environment (GxE) interactions. Existing approaches, however, are limited in their ability to accurately model polygenic GxE involving many exposures across genome-wide genetic variants. It is unclear which exposure combinations are relevant for a given trait while distinguishing true interactions from environment-dependent heteroskedastic noise. To address these challenges, we develop Efficient multi-eNvironmental Gene-environment Interaction iNference Estimator (ENGINE), a supervised variance-component framework that learns an embedding that combines multiple environmental exposures while jointly estimating additive, GxE, and heteroskedastic noise components. To enable biobank-scale inference, ENGINE makes a single pass over the genotype matrix to cache genotype-dependent summaries, then assembles normal-equation components and gradients at each iteration. In simulations, ENGINE controls type I error rates, achieves high power, and accurately recovers the environmental embedding while remaining efficient at biobank-scale. Applied to five complex traits paired with lifestyle exposures in N = 291,273 unrelated white British individuals and M = 454,207 common SNPs (MAF> 0.01) from the UK Biobank, ENGINE recovered GxE variance that was on average 1.4-fold larger than that captured by a single exposure and 5.5-fold larger than that captured by the first principal component of the exposures.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.