Private Information Leakage from Polygenic Risk Scores
Nikitin, K.; Gursoy, G.
Show abstract
Polygenic Risk Scores (PRSs) estimate the likelihood of individuals to develop complex diseases based on their genetic variations. While their use in clinical practice and direct-to-consumer genetic testing is growing, the privacy implications of publicly sharing PRS values are often underestimated. In this work, we demonstrate that PRSs can be exploited to recover genotypes and to de-anonymize individuals. We describe how to reconstruct a portion of an individuals genome from a single PRS value by using dynamic programming and population-based likelihood estimation, which we experimentally demonstrate on PRS panels of up 50 variants. We highlight the risks of combining multiple, even larger-panel PRSs to improve genotype-recovery accuracy, which can lead to the re-identification of individuals or their relatives in genomic databases or to the prediction of additional health risks, not originally associated with the disclosed PRSs. We then develop an analytical frame-work to assess the privacy risk of releasing individual PRS values and provide a potential solution for sharing PRS models without decreasing their utility. Our tool and instructions to reproduce our calculations can be found at https://github.com/G2Lab/prs-privacy.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.