Image-Conditioned Diffusion for Privacy-Preserving Synthetic Medical Images
Yaya-Stupp, D.; Lutsker, G.; Spiegel-Yerushalmi, O.; Segal, E.
Show abstract
Medical imaging models depend on large, shareable datasets, yet privacy constraints limit data dissemination. Current text-conditioned diffusion models fail to preserve subtle, distributed clinical signals, such as continuous physiological biomarkers, rendering synthetic data insufficient for robust downstream physiological modeling. Here, we evaluate image-to-image (I2I) diffusion as a tunable, privacy-preserving transformation that produces a synthetic counterpart of real images while preserving downstream-relevant information. We fine-tune Stable Diffusion with low-rank adapters on retinal fundus photographs and chest radiographs, assessing fidelity, clinical signal preservation, cross-site transfer, and empirical re-identification risk. I2I consistently outperforms text-to-image generation in image fidelity and in preserving biomarker information. In cross-cohort transfer to an external retinal dataset from the UK Biobank, pretraining on I2I synthetic data performs comparably to real-image pretraining and surpasses it in the smallest fine-tuning sets. Varying I2I strength reveals that the privacy-utility tradeoff is highly modality-dependent: while retinal images achieve practical de-identification, chest X-rays exhibit structural combinatorics that leave them substantially re-identifiable even at high noise strengths, exposing critical boundaries for diffusion-based anonymization. These results position image-conditioned diffusion as a practical approach for generating shareable medical images with tunable de-identification.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.