Back

Bias in genome-wide association test statistics due to omitted interactions

Yelmen, B.; Güler, M. N.; Estonian Biobank Research Team, ; Kollo, T.; Möls, M.; Charpiat, G.; Jay, F.

2026-02-22 bioinformatics
10.1101/2025.11.21.689603 bioRxiv
Show abstract

Over the past two decades, genome-wide association studies (GWAS) enabled the discovery of thousands of variants associated with many complex human traits. However, conventional GWAS are still widely performed with linear models with the assumption that the genetic effects are predominantly additive. In this work, we investigate the test statistic behavior when linear models are used to obtain significant genotype-phenotype associations without accounting for epistasis. We first algebraically derive mean and variance shift in the null statistic due to the omitted interaction term, and define the boundary between conservative (i.e., deflated statistic tail) and anti-conservative (i.e., inflated statistic tail) regimes for the common GWAS significance threshold. We then perform phenotype simulation analyses using the Estonian Biobank genotypes and validate the mathematical model. We demonstrate that the anti-conservative regime is plausible under realistic parameter settings and models omitting interaction terms can produce spurious significance. Our findings suggest caution when interpreting statistically significant signals reported in the literature based on linear models, especially for large-scale GWAS.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 3%
8.5%
2
Genetic Epidemiology
46 papers in training set
Top 0.1%
6.9%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
6.9%
4
Scientific Reports
3102 papers in training set
Top 14%
6.9%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.7%
6.4%
6
Genetics
225 papers in training set
Top 0.8%
4.9%
7
Biophysical Journal
545 papers in training set
Top 1%
4.4%
8
Frontiers in Genetics
197 papers in training set
Top 1%
4.4%
9
PLOS ONE
4510 papers in training set
Top 34%
4.4%
50% of probability mass above
10
PLOS Genetics
756 papers in training set
Top 4%
4.2%
11
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
12
Physical Review E
95 papers in training set
Top 0.3%
3.6%
13
Communications Biology
886 papers in training set
Top 6%
1.9%
14
Frontiers in Neuroscience
223 papers in training set
Top 4%
1.7%
15
Statistics in Medicine
34 papers in training set
Top 0.2%
1.7%
16
BioData Mining
15 papers in training set
Top 0.3%
1.7%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
19
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.2%
1.3%
20
Human Genetics and Genomics Advances
70 papers in training set
Top 0.6%
0.9%
21
Frontiers in Physics
20 papers in training set
Top 0.7%
0.9%
22
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
0.9%
23
Biometrics
22 papers in training set
Top 0.2%
0.8%
24
Biostatistics
21 papers in training set
Top 0.1%
0.8%
25
PeerJ
261 papers in training set
Top 15%
0.8%
26
Nature Communications
4913 papers in training set
Top 64%
0.7%
27
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.7%
28
iScience
1063 papers in training set
Top 37%
0.7%
29
European Journal of Human Genetics
49 papers in training set
Top 2%
0.7%
30
Human Brain Mapping
295 papers in training set
Top 5%
0.7%