Back

Replicability of unsupervised deep learning derived image phenotypes

Xia, T.; ISLAM, S. M. S.; Xie, Z.; Zhao, X.; Zhi, D.

2026-05-19 bioinformatics
10.64898/2026.05.19.726257 bioRxiv
Show abstract

Unsupervised deep-learning image phenotypes derived from brain MRI are propelling imaging genetics to link brain structure to genetic variation. However, their replicability across data sets has not been sufficiently evaluated, raising questions about whether they capture robust biological structure or reflect training-specific artifacts. Here, we assess the replicability of unsupervised deep-learning image phenotypes under variation in model initialization, data partitioning, and cohort, directly evaluating their stability across experimental conditions. We trained multiple models under (i) different training batch random seeds, (ii) cross-validation splits, and (iii) independent datasets (UKB and ADNI), across CNN and ViT architectures. We then derived representations from a separate UKB discovery cohort (N = 22,985) for both trained models and random initialized models without training. The representation stability was assessed using centered kernel alignment (CKA; mean ViT 0.74 vs random 0.27) and kernel canonical correlation analysis (KCCA; mean ViT 0.84 vs random 0.60), as well as genetic discovery stability using loci overlap ratio (mean ViT 0.45 vs random 0.08). We further applied weighted MAXVAR generalized CCA to 12 embeddings to extract a shared 30-dimensional subspace. Our result showed that UDIPs exhibit statistically significant stability (CKA, KCCA t test p < 0.001) across training perturbations and preserve biologically meaningful structure (loci overlap ratio t test p <0.001) across cohorts, supporting their use in imaging genetics.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
NeuroImage
813 papers in training set
Top 0.8%
14.5%
2
Medical Image Analysis
33 papers in training set
Top 0.1%
14.5%
3
PLOS Computational Biology
1633 papers in training set
Top 3%
10.5%
4
Human Brain Mapping
295 papers in training set
Top 0.7%
8.5%
5
Communications Biology
886 papers in training set
Top 0.3%
6.4%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 17%
6.4%
7
Nature Communications
4913 papers in training set
Top 33%
4.9%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.8%
9
Frontiers in Genetics
197 papers in training set
Top 3%
2.6%
10
GigaScience
172 papers in training set
Top 0.9%
2.1%
11
PLOS ONE
4510 papers in training set
Top 51%
1.8%
12
Bioinformatics
1061 papers in training set
Top 7%
1.8%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
14
Frontiers in Neuroscience
223 papers in training set
Top 5%
1.3%
15
eLife
5422 papers in training set
Top 47%
1.3%
16
Science Advances
1098 papers in training set
Top 25%
1.0%
17
Cell Genomics
162 papers in training set
Top 5%
1.0%
18
Advanced Science
249 papers in training set
Top 18%
0.8%
19
Nature Methods
336 papers in training set
Top 6%
0.7%
20
Patterns
70 papers in training set
Top 3%
0.6%
21
Biological Psychiatry
119 papers in training set
Top 3%
0.6%
22
Alzheimer's & Dementia
143 papers in training set
Top 3%
0.5%
23
Genome Medicine
154 papers in training set
Top 10%
0.5%
24
Aperture Neuro
18 papers in training set
Top 0.5%
0.5%
25
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 49%
0.5%