Back

A simple approach for multiple observations improves power to detect genetic effects and genomic prediction accuracy.

Evans, L. M.; Arehart, C. H.; Gibson, R. A.; Bowman, G. I.; Gignoux, C.

2025-09-21 genetic and genomic medicine
10.1101/2025.09.19.25336197 medRxiv
Show abstract

Many datasets, including widely used biobanks, have more than one observation of numerous phenotypes for at least a portion of their sample. The majority of GWAS utilize only a single observation per individual, even when more than one observation may be available, and apply a standard model in which the additive allelic effect being estimated is assumed to be constant across the age or time range in the sample. Here, we test a set of simple approaches to utilize multiple observations per individual, under this same assumption. We find that utilizing the mean or median of the available observations rather than a single observation improves power to detect associated loci and enriched gene sets and yields higher out-of-sample polygenic score prediction accuracy. Despite growing biobanks, many deeply phenotyped samples are relatively small but have multiple observations. While explicitly modeling age- or time-dependent genetic effects can estimate time- or age-specific genetic effects, most GWAS apply a standard, additive-only model; a simple approach of using the mean or median can improve power by reducing "noise" in the phenotype, utilize standard, optimized software, and be particularly impactful for smaller samples, including samples of diverse genetic ancestry currently existing in widely used biobanks.

Published in Human Genetics and Genomics Advances (predicted rank #3) · training set

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
40.3%
2
PLOS Genetics
756 papers in training set
Top 1%
10.3%
50% of probability mass above
Human Genetics and Genomics Advances · published here
70 papers in training set
Top 0.1%
8.6%
4
Nature Genetics
240 papers in training set
Top 2%
4.4%
5
Human Molecular Genetics
130 papers in training set
Top 0.5%
4.4%
6
Genetic Epidemiology
46 papers in training set
Top 0.2%
4.1%
7
Cell Genomics
162 papers in training set
Top 1%
3.7%
8
Bioinformatics
1061 papers in training set
Top 6%
2.1%
9
Genome Medicine
154 papers in training set
Top 4%
1.7%
10
Nature Human Behaviour
85 papers in training set
Top 2%
1.7%
11
Nature Communications
4913 papers in training set
Top 51%
1.7%
12
Scientific Reports
3102 papers in training set
Top 63%
1.4%
13
Genome Biology
555 papers in training set
Top 5%
1.3%
14
Cell Systems
167 papers in training set
Top 9%
1.2%
15
BMC Genomics
328 papers in training set
Top 5%
0.8%
16
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
17
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
18
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
19
GENETICS
189 papers in training set
Top 2%
0.7%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 47%
0.7%
21
eLife
5422 papers in training set
Top 61%
0.7%
22
Science
429 papers in training set
Top 21%
0.7%
23
PLOS ONE
4510 papers in training set
Top 73%
0.5%
24
BMC Medical Genomics
36 papers in training set
Top 2%
0.5%