Back

Private Information Leakage from Polygenic Risk Scores

Nikitin, K.; Gursoy, G.

2026-02-18 bioinformatics
10.64898/2026.02.16.706191 bioRxiv
Show abstract

Polygenic Risk Scores (PRSs) estimate the likelihood of individuals to develop complex diseases based on their genetic variations. While their use in clinical practice and direct-to-consumer genetic testing is growing, the privacy implications of publicly sharing PRS values are often underestimated. In this work, we demonstrate that PRSs can be exploited to recover genotypes and to de-anonymize individuals. We describe how to reconstruct a portion of an individuals genome from a single PRS value by using dynamic programming and population-based likelihood estimation, which we experimentally demonstrate on PRS panels of up 50 variants. We highlight the risks of combining multiple, even larger-panel PRSs to improve genotype-recovery accuracy, which can lead to the re-identification of individuals or their relatives in genomic databases or to the prediction of additional health risks, not originally associated with the disclosed PRSs. We then develop an analytical frame-work to assess the privacy risk of releasing individual PRS values and provide a potential solution for sharing PRS models without decreasing their utility. Our tool and instructions to reproduce our calculations can be found at https://github.com/G2Lab/prs-privacy.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
33.0%
2
Nature Communications
4913 papers in training set
Top 14%
12.3%
3
Bioinformatics
1061 papers in training set
Top 4%
4.9%
50% of probability mass above
4
Genome Research
409 papers in training set
Top 0.7%
4.3%
5
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
6
The American Journal of Human Genetics
206 papers in training set
Top 1%
3.1%
7
Scientific Reports
3102 papers in training set
Top 44%
2.7%
8
iScience
1063 papers in training set
Top 8%
2.4%
9
Nature Computational Science
50 papers in training set
Top 0.3%
2.4%
10
Nature Biotechnology
147 papers in training set
Top 4%
1.9%
11
PLOS ONE
4510 papers in training set
Top 50%
1.9%
12
Nature Methods
336 papers in training set
Top 4%
1.8%
13
Nature Genetics
240 papers in training set
Top 4%
1.7%
14
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
1.2%
15
Genetics
225 papers in training set
Top 3%
1.2%
16
Patterns
70 papers in training set
Top 2%
0.9%
17
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
0.9%
18
Genome Medicine
154 papers in training set
Top 6%
0.9%
19
Science Advances
1098 papers in training set
Top 26%
0.9%
20
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
21
Nucleic Acids Research
1128 papers in training set
Top 17%
0.7%
22
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
23
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
24
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
25
Genome Biology
555 papers in training set
Top 8%
0.7%
26
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 47%
0.6%
27
Cell
370 papers in training set
Top 18%
0.6%
28
BioData Mining
15 papers in training set
Top 1%
0.6%
29
Nature Medicine
117 papers in training set
Top 6%
0.6%