Back

Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning

Monti, R.; Eick, L.; Hudjashov, G.; Läll, K.; Kanoni, S.; Wolford, B. N.; Wingfield, B.; Pain, O.; Wharrie, S.; Jermy, B.; McMahon, A.; Hartonen, T.; Heyne, H. O.; Mars, N.; Genes & Health Research Team, ; Hveem, K.; Inouye, M.; van Heel, D. A.; Mägi, R.; Marttinen, P.; Ripatti, S.; Ganna, A.; Lippert, C.

2023-11-20 genetic and genomic medicine

10.1101/2023.11.20.23298215 medRxiv

Show abstract

Methods to estimate polygenic scores (PGS) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived using seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling and target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well-tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes ({beta}-coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best performing single methods when tuned with cross-validation). Our interactively browsable online-results (https://methodscomparison.intervenegeneticscores.org/) and open-source workflow prspipe (https://github.com/intervene-EU-H2020/prspipe) provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.

Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning

Matching journals