Back

Locally adaptive conformal prediction intervals for polygenic score-based phenotype prediction via residual normalization and data-driven stratification

Yun, Y.; Hao, X.; Zhang, Y. D.

2026-05-30 genetic and genomic medicine
10.64898/2026.05.28.26354326 medRxiv
Show abstract

Quantifying uncertainty in polygenic score (PGS)-based phenotype prediction is crucial for the integration of genomic data into precision medicine. While the PGS provides a fundamental pivot for point estimation, clinical decision-making necessitates the construction of well-calibrated prediction intervals that reliably encompass the true phenotypic values. However, phenotypic residuals are frequently characterized by complex heteroscedasticity and stratified variance structures across diverse demographic contexts. Existing approaches often rely on global calibration mechanisms, which fail to account for such localized variance structures and lead to systematic miscalibration within specific subpopulations. To bridge this gap, we propose Clustering-based Split Conformal Prediction with Normalized Residuals (C-SCNR), a versatile framework based on Split Conformal Prediction. By adopting residual normalization and incorporating a repetitive `split-and-cluster` mechanism, C-SCNR dynamically identifies latent error strata and applies fine-grained adjustments to the resulting intervals. Our framework requires no distributional assumptions regarding the phenotype, is compatible with any PGS method, and flexibly accommodates biologically-informed grouping. Simulation studies demonstrate that our framework consistently outperforms existing methods across diverse error distributions. In real-data applications analyzing Body mass index (BMI), Low-density lipoprotein (LDL) cholesterol, and High-density lipoprotein (HDL) cholesterol in the UK Biobank, C-SCNR effectively resolves the coverage deficiencies of existing methods in specific subgroups and consistently yields superior localized calibration. Overall, C-SCNR represents a flexible and powerful framework for constructing high-resolution context-specific prediction intervals, thereby facilitating more reliable clinical interpretations of polygenic risk.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.2%
18.1%
2
Genome Medicine
154 papers in training set
Top 0.4%
12.0%
3
Nature Communications
4913 papers in training set
Top 20%
9.8%
4
Bioinformatics
1061 papers in training set
Top 4%
6.6%
5
Nature Genetics
240 papers in training set
Top 1%
6.1%
50% of probability mass above
6
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
7
Nucleic Acids Research
1128 papers in training set
Top 8%
2.5%
8
Cell Genomics
162 papers in training set
Top 3%
2.0%
9
Science Advances
1098 papers in training set
Top 14%
2.0%
10
PLOS Computational Biology
1633 papers in training set
Top 14%
2.0%
11
Genetic Epidemiology
46 papers in training set
Top 0.4%
1.8%
12
Cell Systems
167 papers in training set
Top 7%
1.7%
13
Frontiers in Genetics
197 papers in training set
Top 4%
1.7%
14
Communications Biology
886 papers in training set
Top 10%
1.6%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
16
Scientific Reports
3102 papers in training set
Top 62%
1.4%
17
Genome Biology
555 papers in training set
Top 5%
1.4%
18
Nature Medicine
117 papers in training set
Top 3%
1.3%
19
PLOS Genetics
756 papers in training set
Top 11%
1.2%
20
GENETICS
189 papers in training set
Top 0.9%
1.2%
21
npj Digital Medicine
97 papers in training set
Top 3%
1.2%
22
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
23
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.9%
24
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
25
European Journal of Human Genetics
49 papers in training set
Top 1%
0.8%
26
International Journal of Epidemiology
74 papers in training set
Top 3%
0.8%
27
PLOS ONE
4510 papers in training set
Top 69%
0.7%
28
iScience
1063 papers in training set
Top 34%
0.7%
29
Nature Human Behaviour
85 papers in training set
Top 5%
0.7%
30
Human Genetics and Genomics Advances
70 papers in training set
Top 0.9%
0.7%