Back

Cross-Scanner Reliability of Brain MRI Foundation Model Embeddings: A Travelling-Heads Study

Navarro-Gonzalez, R.; Aja-Fernandez, S.; Planchuelo-Gomez, A.; de Luis-Garcia, R.

2026-03-25 radiology and imaging
10.64898/2026.03.23.26348808 medRxiv
Show abstract

Foundation models (FMs) for brain magnetic resonance imaging (MRI) are increasingly adopted as pretrained backbones for clinical tasks such as brain age prediction, disease classification, and anomaly detection. However, if FM embeddings (internal representations) shift systematically across MRI scanners, downstream analyses built on them may reflect acquisition hardware rather than biology. No study has yet quantified this cross-scanner reproducibility. Here, we assess the cross-scanner reliability of brain MRI FM embeddings and investigate which design factors (pretraining strategy, network architecture, embedding dimensionality, and pretraining dataset scale) best explain the observed differences. Using the ON-Harmony travelling-heads dataset (20 participants, eight scanners, three vendors), we evaluate the embeddings of five architecturally diverse FMs and a FreeSurfer morphometric baseline via within- and between-scanner intraclass correlation coefficient (ICC), variance decomposition, and scanner fingerprinting. Reliability spanned the full spectrum: biology-guided models achieved good-to-excellent cross-scanner ICC (AnatCL: 0.970 [95\% confidence interval (CI): 0.94, 0.98]; y-Aware: 0.809 [0.63, 0.88]), matching or surpassing FreeSurfer (0.926 [0.83, 0.96]), whereas purely self-supervised models fell below the poor threshold (BrainIAC: 0.453, BrainSegFounder: 0.307, 3D-Neuro-SimCLR: 0.247), with 23--58\% of embedding variance attributable to scanner identity. The strongest correlate of cross-scanner reliability among the models evaluated was pretraining strategy: incorporating biological metadata (cortical morphometrics, age) into the contrastive objective produced scanner-robust embeddings, whereas architecture, dimensionality, and dataset scale did not predict reliability.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Human Brain Mapping
295 papers in training set
Top 0.1%
23.0%
2
NeuroImage
813 papers in training set
Top 0.5%
19.0%
3
Imaging Neuroscience
242 papers in training set
Top 0.5%
6.5%
4
Scientific Reports
3102 papers in training set
Top 22%
4.9%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 34%
4.4%
6
NeuroImage: Clinical
132 papers in training set
Top 1%
3.7%
7
Journal of Medical Imaging
11 papers in training set
Top 0.1%
3.7%
8
Aperture Neuro
18 papers in training set
Top 0.1%
2.5%
9
Medical Image Analysis
33 papers in training set
Top 0.5%
2.1%
10
Magnetic Resonance in Medicine
72 papers in training set
Top 0.4%
1.9%
11
eLife
5422 papers in training set
Top 41%
1.7%
12
eBioMedicine
130 papers in training set
Top 2%
1.5%
13
GigaScience
172 papers in training set
Top 2%
1.4%
14
Nature Computational Science
50 papers in training set
Top 1%
1.1%
15
Magnetic Resonance Imaging
21 papers in training set
Top 0.4%
1.1%
16
Nature Medicine
117 papers in training set
Top 3%
1.0%
17
PLOS ONE
4510 papers in training set
Top 62%
1.0%
18
Scientific Data
174 papers in training set
Top 2%
0.9%
19
Science Translational Medicine
111 papers in training set
Top 5%
0.9%
20
Medical Physics
14 papers in training set
Top 0.5%
0.9%
21
European Radiology
14 papers in training set
Top 0.6%
0.8%
22
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
23
Science Advances
1098 papers in training set
Top 30%
0.7%
24
Nature Methods
336 papers in training set
Top 6%
0.7%
25
Brain Communications
147 papers in training set
Top 4%
0.7%
26
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 47%
0.7%
27
NMR in Biomedicine
24 papers in training set
Top 0.4%
0.7%
28
Frontiers in Neuroimaging
11 papers in training set
Top 0.5%
0.7%
29
Patterns
70 papers in training set
Top 3%
0.5%
30
Frontiers in Neuroscience
223 papers in training set
Top 9%
0.5%