Back

A biobank-scale method for learning modulators of gene-environment interaction underlying human complex traits from multiple environmental exposures

Liu, Z.; Ramteke, A.; Anand, A.; Gorla, A.; Jeong, M.; Sankararaman, S.

2026-03-16 genetics
10.64898/2026.03.13.711725 bioRxiv
Show abstract

It is increasingly recognized that genetic effects on complex traits and diseases are shaped by environmental context. Biobanks that measure diverse environmental exposures alongside genotypes and phenotypes at scale enable systematic study of gene-environment (GxE) interactions. Existing approaches, however, are limited in their ability to accurately model polygenic GxE involving many exposures across genome-wide genetic variants. It is unclear which exposure combinations are relevant for a given trait while distinguishing true interactions from environment-dependent heteroskedastic noise. To address these challenges, we develop Efficient multi-eNvironmental Gene-environment Interaction iNference Estimator (ENGINE), a supervised variance-component framework that learns an embedding that combines multiple environmental exposures while jointly estimating additive, GxE, and heteroskedastic noise components. To enable biobank-scale inference, ENGINE makes a single pass over the genotype matrix to cache genotype-dependent summaries, then assembles normal-equation components and gradients at each iteration. In simulations, ENGINE controls type I error rates, achieves high power, and accurately recovers the environmental embedding while remaining efficient at biobank-scale. Applied to five complex traits paired with lifestyle exposures in N = 291,273 unrelated white British individuals and M = 454,207 common SNPs (MAF> 0.01) from the UK Biobank, ENGINE recovered GxE variance that was on average 1.4-fold larger than that captured by a single exposure and 5.5-fold larger than that captured by the first principal component of the exposures.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
26.4%
2
Nature Genetics
240 papers in training set
Top 0.3%
15.0%
3
Nature Communications
4913 papers in training set
Top 21%
9.3%
50% of probability mass above
4
Nature
575 papers in training set
Top 7%
3.7%
5
Science
429 papers in training set
Top 10%
2.9%
6
Cell Genomics
162 papers in training set
Top 2%
2.8%
7
Bioinformatics
1061 papers in training set
Top 6%
2.8%
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.8%
9
Genome Biology
555 papers in training set
Top 3%
2.7%
10
Genome Medicine
154 papers in training set
Top 3%
2.4%
11
Nature Human Behaviour
85 papers in training set
Top 1%
2.4%
12
Nature Biotechnology
147 papers in training set
Top 4%
1.8%
13
International Journal of Epidemiology
74 papers in training set
Top 1%
1.7%
14
Science Translational Medicine
111 papers in training set
Top 3%
1.5%
15
Genome Research
409 papers in training set
Top 3%
1.4%
16
Nature Neuroscience
216 papers in training set
Top 5%
1.4%
17
PLOS Genetics
756 papers in training set
Top 11%
1.2%
18
Science Advances
1098 papers in training set
Top 23%
1.2%
19
Nature Computational Science
50 papers in training set
Top 1%
0.9%
20
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
21
Communications Biology
886 papers in training set
Top 18%
0.9%
22
Nature Medicine
117 papers in training set
Top 4%
0.8%
23
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
24
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
25
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
26
Nature Methods
336 papers in training set
Top 7%
0.5%
27
Nature Aging
51 papers in training set
Top 2%
0.5%
28
Cell Reports
1338 papers in training set
Top 36%
0.5%
29
PLOS ONE
4510 papers in training set
Top 73%
0.5%