Back

Image-Conditioned Diffusion for Privacy-Preserving Synthetic Medical Images

Yaya-Stupp, D.; Lutsker, G.; Spiegel-Yerushalmi, O.; Segal, E.

2026-05-07 bioinformatics
10.64898/2026.05.04.722524 bioRxiv
Show abstract

Medical imaging models depend on large, shareable datasets, yet privacy constraints limit data dissemination. Current text-conditioned diffusion models fail to preserve subtle, distributed clinical signals, such as continuous physiological biomarkers, rendering synthetic data insufficient for robust downstream physiological modeling. Here, we evaluate image-to-image (I2I) diffusion as a tunable, privacy-preserving transformation that produces a synthetic counterpart of real images while preserving downstream-relevant information. We fine-tune Stable Diffusion with low-rank adapters on retinal fundus photographs and chest radiographs, assessing fidelity, clinical signal preservation, cross-site transfer, and empirical re-identification risk. I2I consistently outperforms text-to-image generation in image fidelity and in preserving biomarker information. In cross-cohort transfer to an external retinal dataset from the UK Biobank, pretraining on I2I synthetic data performs comparably to real-image pretraining and surpasses it in the smallest fine-tuning sets. Varying I2I strength reveals that the privacy-utility tradeoff is highly modality-dependent: while retinal images achieve practical de-identification, chest X-rays exhibit structural combinatorics that leave them substantially re-identifiable even at high noise strengths, exposing critical boundaries for diffusion-based anonymization. These results position image-conditioned diffusion as a practical approach for generating shareable medical images with tunable de-identification.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 13%
12.6%
2
npj Digital Medicine
97 papers in training set
Top 0.6%
8.5%
3
Nature Methods
336 papers in training set
Top 1%
7.2%
4
Nature Biomedical Engineering
42 papers in training set
Top 0.1%
6.9%
5
Nature Medicine
117 papers in training set
Top 0.3%
6.4%
6
Cell Systems
167 papers in training set
Top 3%
4.9%
7
Advanced Science
249 papers in training set
Top 5%
3.6%
50% of probability mass above
8
Nature Biotechnology
147 papers in training set
Top 3%
3.6%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 20%
3.6%
10
Science Advances
1098 papers in training set
Top 8%
3.1%
11
Nature Machine Intelligence
61 papers in training set
Top 1%
3.1%
12
PLOS ONE
4510 papers in training set
Top 44%
2.6%
13
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
14
Scientific Reports
3102 papers in training set
Top 49%
2.1%
15
Nature
575 papers in training set
Top 11%
1.7%
16
Bioinformatics
1061 papers in training set
Top 8%
1.3%
17
Patterns
70 papers in training set
Top 1%
1.2%
18
Science
429 papers in training set
Top 17%
1.2%
19
The American Journal of Human Genetics
206 papers in training set
Top 3%
1.0%
20
eLife
5422 papers in training set
Top 53%
0.9%
21
Medical Image Analysis
33 papers in training set
Top 0.9%
0.9%
22
Nature Computational Science
50 papers in training set
Top 1%
0.9%
23
Cancer Research
116 papers in training set
Top 3%
0.9%
24
Nature Genetics
240 papers in training set
Top 7%
0.8%
25
IEEE Transactions on Medical Imaging
18 papers in training set
Top 0.5%
0.8%
26
Communications Biology
886 papers in training set
Top 21%
0.8%
27
NeuroImage
813 papers in training set
Top 6%
0.7%
28
Nucleic Acids Research
1128 papers in training set
Top 21%
0.5%
29
Cell Reports Medicine
140 papers in training set
Top 10%
0.5%