Back

Foundation Model Robustness to Technical Acquisition Parameters in Chest X-Ray AI A Multi-Architecture Comparative Study with External Validation

Farquhar, H.

2026-01-27 radiology and imaging
10.64898/2026.01.25.26344809 medRxiv
Show abstract

BackgroundFoundation models have emerged as a promising paradigm for medical imaging AI [7], with claims of improved generalization and reduced bias. However, their robustness to technical acquisition parameters remains unexplored. We evaluated whether foundation models exhibit greater robustness to chest radiograph view type (anteroposterior [AP] versus posteroanterior [PA]) compared to traditional convolutional neural networks. MethodsWe compared four model architectures on the RSNA Pneumonia Detection Challenge dataset (n=26,684 images) and externally validated on the NIH ChestX-ray14 dataset (n=112,120 images): DenseNet-121 (supervised CNN), BiomedCLIP (vision-language model trained on 15 million biomedical image-text pairs), RAD-DINO (self-supervised model trained on 5+ million radiographs), and CheXzero (vision-language model trained on MIMIC-CXR chest radiographs). Primary outcome was the sensitivity gap between AP and PA views, with bootstrap confidence intervals and permutation testing. ResultsOn RSNA, CheXzero showed the smallest gap (14.3%, 95% CI: 11.2-17.5%), followed by RAD-DINO (25.2%, 22.6-27.9%), DenseNet-121 (35.7%, 32.9-38.7%), and BiomedCLIP (36.1%, 33.5-39.0%). However, on external validation (NIH), model rankings reversed completely: RAD-DINO demonstrated the smallest gap (22.3%, 95% CI: 21.0-23.6%), while CheXzeros gap increased dramatically to 48.9% (95% CI: 47.7-50.1%). Domain-specific training provided robustness within the training domain but failed to generalize. On PA view pneumonia cases in NIH, 31% were missed by all four models, representing a systematic blind spot. View type explained 61-100% of performance variance across models on both datasets, compared to 0-38% for age and less than 4% for sex. ConclusionsFoundation models do not eliminate technical acquisition parameter biases in chest X-ray AI. While domain-specific training (CheXzero) provided superior robustness on internal validation, this advantage collapsed on external data. Self-supervised learning (RAD-DINO) demonstrated the most generalizable robustness, with consistent view type gap stability across datasets with different labeling schemes (25.2% [->] 22.3%, despite substantial AUC differences). These findings challenge assumptions about foundation model generalization and highlight the need for acquisition parameter auditing in AI regulatory frameworks and multi-site external validation for robustness claims.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
The Lancet Digital Health
25 papers in training set
Top 0.1%
14.3%
2
PLOS Digital Health
91 papers in training set
Top 0.1%
12.3%
3
Scientific Reports
3102 papers in training set
Top 10%
8.4%
4
PLOS ONE
4510 papers in training set
Top 25%
6.8%
5
PLOS Computational Biology
1633 papers in training set
Top 7%
4.8%
6
npj Digital Medicine
97 papers in training set
Top 1%
4.3%
50% of probability mass above
7
European Radiology
14 papers in training set
Top 0.2%
4.3%
8
Diagnostics
48 papers in training set
Top 0.4%
4.0%
9
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.2%
3.6%
10
JAMA Network Open
127 papers in training set
Top 1%
2.6%
11
GigaScience
172 papers in training set
Top 1.0%
2.1%
12
Nature Machine Intelligence
61 papers in training set
Top 1%
2.1%
13
eBioMedicine
130 papers in training set
Top 0.8%
2.1%
14
Nature Communications
4913 papers in training set
Top 49%
1.9%
15
Journal of Medical Imaging
11 papers in training set
Top 0.1%
1.8%
16
Medical Physics
14 papers in training set
Top 0.4%
1.7%
17
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
18
BMC Medicine
163 papers in training set
Top 4%
1.5%
19
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.4%
1.3%
20
BMJ Open
554 papers in training set
Top 10%
1.3%
21
Patterns
70 papers in training set
Top 1%
1.2%
22
Frontiers in Medicine
113 papers in training set
Top 5%
0.9%
23
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.9%
0.8%
24
IEEE Access
31 papers in training set
Top 0.9%
0.8%
25
Heliyon
146 papers in training set
Top 7%
0.7%
26
eLife
5422 papers in training set
Top 59%
0.7%