Privacy-Preserving Multivariate Bayesian Regression Models for Overcoming Data Sharing Barriers in Health and Genomics
Sorensen, I. F.; Sorensen, P.
Show abstract
We present multivariate Bayesian regression models specifically designed to over-come data-sharing barriers in health and genomics. These multi-response models are well suited for scenarios where data must remain decentralized due to privacy, intellectual property, or regulatory constraints. In extensive simulation studies, our approach consistently outperformed traditional single-response models trained on individual datasets, particularly under real-world conditions such as low signal, unbalanced cohorts, and high-dimensional feature spaces. For the first time, we demonstrate that multivariate Bayesian regression can be implemented using or-thogonal transformations of sufficient statistics, enabling fully privacy-preserving analysis without sharing individual-level data. The models are scalable, inter-pretable, and applicable to predictive tasks across diverse collaborators, supporting secure data-driven research in domains such as clinical trials, biomarker discovery, and precision health.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.