Back

A harmonized benchmarking framework for implementation-aware evaluation of 46 polygenic risk score tools across binary and continuous phenotypes

Muneeb, M.; Ascher, D.

2026-03-23 bioinformatics
10.64898/2026.03.22.713457 bioRxiv
Show abstract

Polygenic risk score (PRS) tools differ substantially in statistical assumptions, input requirements, and implementation complexity, making direct comparison difficult. We developed a harmonized, implementation-aware benchmarking framework to evaluate 46 PRS tools across seven binary UK Biobank phenotypes and one continuous trait under three model configurations: null, PRS-only, and PRS plus covariates. The framework integrates standardized preprocessing, tool-specific execution, hyperparameter exploration, and unified downstream evaluation using five-fold cross-validation on high-performance computing infrastructure. In addition to predictive performance, we assessed runtime, memory use, input dependencies, and failure modes. A Friedman test across 40 phenotype-fold combinations confirmed significant differences in tool rankings ({chi}2 = 102.29, p = 2.57 x 10-11), with no single method universally optimal. These findings provide a reproducible framework for comparative PRS evaluation and demonstrate that tool performance is shaped not only by statistical methodology but also by phenotype architecture, preprocessing choices, covariate structure, computational demands, software robustness, and practical implementation constraints.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
12.5%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.6%
7.2%
3
BMC Bioinformatics
383 papers in training set
Top 1%
7.2%
4
GigaScience
172 papers in training set
Top 0.1%
6.8%
5
Nature Communications
4913 papers in training set
Top 28%
6.4%
6
Bioinformatics Advances
184 papers in training set
Top 0.4%
6.4%
7
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.3%
4.9%
50% of probability mass above
8
PLOS Computational Biology
1633 papers in training set
Top 8%
4.3%
9
Genome Medicine
154 papers in training set
Top 2%
3.6%
10
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
11
PLOS ONE
4510 papers in training set
Top 46%
2.4%
12
Scientific Reports
3102 papers in training set
Top 50%
2.1%
13
Genome Biology
555 papers in training set
Top 4%
1.9%
14
Database
51 papers in training set
Top 0.4%
1.7%
15
BMC Genomics
328 papers in training set
Top 2%
1.7%
16
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
17
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
18
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
19
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.5%
20
BioData Mining
15 papers in training set
Top 0.6%
1.1%
21
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.6%
1.1%
22
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
23
Communications Biology
886 papers in training set
Top 19%
0.9%
24
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%
25
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.8%
0.6%
26
Cell Genomics
162 papers in training set
Top 7%
0.6%
27
Patterns
70 papers in training set
Top 3%
0.6%
28
Genome Research
409 papers in training set
Top 5%
0.6%