Back

Optimizing phenotype scale improves genetic analyses in large-scale biobanks

Huang, Z.; Costantino, M.; Dahl, A.

2026-05-07 genetics
10.64898/2026.05.04.722531 bioRxiv
Show abstract

Large-scale biobanks have enabled increasingly complicated genetic analyses across thousands of phenotypes. However, studies rarely consider the appropriate phenotype measurement scale, a problem that can drastically affect inferences on genetic architecture. Here, we introduce SIQReg, a practical solution to this classical problem, which learns a data-driven phenotype scale by minimizing heterogeneity across phenotype quantiles. Applied to complex traits in UK Biobank, SIQReg rejects the default scale for 24/25 traits. Generally, SIQReg scales lie between default and logarithmic, indicating that default-scale traits are neither purely additive nor purely multiplicative. We show that SIQReg improves both non-additive and additive genetic analyses. SIQReg eliminates most non-additive genetic signals (such as 97% of vQTL and 76% of quantile-dependent TWAS genes), indicating they may be statistical artifacts, while preserving biologically plausible non-additive signals. Simultaneously, SIQReg improves power to detect additive signals, increasing GWAS loci, TWAS genes, and PGS prediction accuracy by 11%, 13%, and 10%, respectively, and identifies 50% more high-risk individuals. These gains replicate across ancestry groups. Our results establish SIQReg as a principled approach to phenotype scale transformation that improves genetic analyses of complex traits.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.1%
28.7%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
28.7%
50% of probability mass above
3
Nature Communications
4913 papers in training set
Top 35%
4.5%
4
Genome Biology
555 papers in training set
Top 2%
3.7%
5
Cell Genomics
162 papers in training set
Top 1%
3.7%
6
Bioinformatics
1061 papers in training set
Top 6%
3.0%
7
Nature Human Behaviour
85 papers in training set
Top 2%
2.0%
8
Science
429 papers in training set
Top 13%
2.0%
9
Nature
575 papers in training set
Top 11%
1.8%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.8%
11
PLOS Genetics
756 papers in training set
Top 9%
1.5%
12
Genome Medicine
154 papers in training set
Top 6%
1.3%
13
Genome Research
409 papers in training set
Top 3%
1.3%
14
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
15
Human Genetics and Genomics Advances
70 papers in training set
Top 0.6%
0.9%
16
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
17
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
18
European Journal of Human Genetics
49 papers in training set
Top 1%
0.8%
19
Science Translational Medicine
111 papers in training set
Top 6%
0.7%
20
International Journal of Epidemiology
74 papers in training set
Top 3%
0.7%
21
PLOS ONE
4510 papers in training set
Top 70%
0.7%
22
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
23
Nature Computational Science
50 papers in training set
Top 2%
0.7%
24
Cell
370 papers in training set
Top 18%
0.7%
25
Genetic Epidemiology
46 papers in training set
Top 1%
0.5%
26
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%
27
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%
28
Nature Medicine
117 papers in training set
Top 6%
0.5%