Back

Privacy-Preserving Multivariate Bayesian Regression Models for Overcoming Data Sharing Barriers in Health and Genomics

Sorensen, I. F.; Sorensen, P.

2025-07-30 health informatics
10.1101/2025.07.30.25332448 medRxiv
Show abstract

We present multivariate Bayesian regression models specifically designed to over-come data-sharing barriers in health and genomics. These multi-response models are well suited for scenarios where data must remain decentralized due to privacy, intellectual property, or regulatory constraints. In extensive simulation studies, our approach consistently outperformed traditional single-response models trained on individual datasets, particularly under real-world conditions such as low signal, unbalanced cohorts, and high-dimensional feature spaces. For the first time, we demonstrate that multivariate Bayesian regression can be implemented using or-thogonal transformations of sufficient statistics, enabling fully privacy-preserving analysis without sharing individual-level data. The models are scalable, inter-pretable, and applicable to predictive tasks across diverse collaborators, supporting secure data-driven research in domains such as clinical trials, biomarker discovery, and precision health.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.7%
2
Nature Computational Science
50 papers in training set
Top 0.1%
10.2%
3
Nature Biomedical Engineering
42 papers in training set
Top 0.1%
6.4%
4
npj Digital Medicine
97 papers in training set
Top 1%
4.0%
5
Genome Research
409 papers in training set
Top 0.9%
3.6%
6
Patterns
70 papers in training set
Top 0.2%
3.6%
50% of probability mass above
7
Cell Systems
167 papers in training set
Top 4%
3.6%
8
Science Advances
1098 papers in training set
Top 5%
3.6%
9
Scientific Reports
3102 papers in training set
Top 36%
3.6%
10
Science Translational Medicine
111 papers in training set
Top 1%
3.3%
11
Nature Methods
336 papers in training set
Top 4%
1.9%
12
Nature Machine Intelligence
61 papers in training set
Top 2%
1.9%
13
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.8%
14
PLOS ONE
4510 papers in training set
Top 52%
1.8%
15
Nature Biotechnology
147 papers in training set
Top 4%
1.8%
16
Advanced Science
249 papers in training set
Top 11%
1.7%
17
Med
38 papers in training set
Top 0.3%
1.7%
18
Bioinformatics
1061 papers in training set
Top 7%
1.7%
19
Communications Biology
886 papers in training set
Top 11%
1.5%
20
Nature Medicine
117 papers in training set
Top 3%
1.1%
21
eLife
5422 papers in training set
Top 53%
0.9%
22
Nature Genetics
240 papers in training set
Top 6%
0.9%
23
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
24
iScience
1063 papers in training set
Top 34%
0.7%
25
Nature
575 papers in training set
Top 16%
0.7%
26
Genome Medicine
154 papers in training set
Top 9%
0.6%
27
Genome Biology
555 papers in training set
Top 8%
0.6%
28
JMIR Medical Informatics
17 papers in training set
Top 2%
0.5%
29
Nucleic Acids Research
1128 papers in training set
Top 21%
0.5%
30
Biology Methods and Protocols
53 papers in training set
Top 4%
0.5%