Back

Privacy-Preserving Pangenome Graphs

Blindenbach, J.; Soni, S.; Gursoy, G.

2026-02-18 bioinformatics
10.64898/2026.02.16.706152 bioRxiv
Show abstract

The human pangenome reference, often represented as a graph, promises to capture genetic diversity across populations, but open release of individual haplotypes raises significant privacy concerns, including risks of re-identification and inference of sensitive traits. To address these challenges, we introduce PanMixer, a framework for privacy-preserving pangenome graph releases that selectively obfuscates an individuals haplotypes while retaining the utility of the reference graph. PanMixer formulates the privacy-utility trade-off as a knapsack problem, where privacy risk is quantified using information theory and utility is measured using graph properties. Using the recently released draft human pangenome containing 47 individuals, we show that PanMixer robustly reduces re-identification risk under linkage attacks and genome reconstruction attempts. We also show that PanMixer preserves the accuracy of key downstream applications, including allele frequency estimation, linkage disequilibrium analysis, and read mapping. By addressing privacy concerns, PanMixer enables the inclusion of individuals, particularly those from underrepresented populations, who might otherwise be reluctant to contribute but seek representation in future genomic studies. Our results provide both a practical tool and a generalizable framework for balancing privacy and utility in future large-scale pangenome references.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.5%
14.9%
2
Genome Research
409 papers in training set
Top 0.1%
12.8%
3
Nature Communications
4913 papers in training set
Top 17%
10.2%
4
Nature Biotechnology
147 papers in training set
Top 1%
7.3%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.6%
7.3%
50% of probability mass above
6
Bioinformatics
1061 papers in training set
Top 4%
4.9%
7
Nature Methods
336 papers in training set
Top 2%
4.9%
8
Genome Biology
555 papers in training set
Top 2%
4.2%
9
Nature Genetics
240 papers in training set
Top 2%
3.6%
10
Nucleic Acids Research
1128 papers in training set
Top 7%
2.9%
11
Nature Computational Science
50 papers in training set
Top 0.3%
2.4%
12
Nature
575 papers in training set
Top 11%
1.7%
13
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
14
PLOS Computational Biology
1633 papers in training set
Top 18%
1.3%
15
Advanced Science
249 papers in training set
Top 13%
1.3%
16
PLOS ONE
4510 papers in training set
Top 58%
1.3%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.1%
18
Bioinformatics Advances
184 papers in training set
Top 4%
1.0%
19
Genome Medicine
154 papers in training set
Top 6%
1.0%
20
Science
429 papers in training set
Top 18%
0.9%
21
Scientific Reports
3102 papers in training set
Top 70%
0.9%
22
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.5%
0.8%
23
iScience
1063 papers in training set
Top 34%
0.7%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
25
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
26
Cell Genomics
162 papers in training set
Top 7%
0.7%
27
Science Advances
1098 papers in training set
Top 32%
0.7%
28
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
29
Cell
370 papers in training set
Top 19%
0.5%