Back

Vision-Language Foundation Models Do Not Transfer to Medical Imaging Classification: A Negative Result on Chest X-ray Diagnosis

Fisher, G. R.

2025-12-08 radiology and imaging
10.64898/2025.12.06.25341759 medRxiv
Show abstract

Vision-language models (VLMs) pretrained on web-scale data have achieved remarkable performance across diverse tasks, leading to widespread adoption in industry. A natural question is whether these powerful representations transfer to specialized medical imaging domains, and whether domain-specific medical pretraining improves transfer. We tested these hypotheses using two VLMs on the NIH ChestX-ray14 benchmark: Qwen2.5-VL (pretrained on web data) and BiomedCLIP (pretrained on 15 million PubMed biomedical image-text pairs). Both models dramatically underperformed compared to convolutional neural networks (CNNs) with ImageNet pretraining. Across 5 random seeds, the best VLM achieved F1=0.196 {+/-} 0.004 versus a CNN baseline of F1=0.811. Domain-specific pretraining provided marginal improvement: BiomedCLIPs frozen encoder achieved F1=0.161 {+/-} 0.001 versus Qwens F1=0.124 (+30%), but this remains clinically inadequate. Fine-tuning both models led to catastrophic overfitting, with sensitivity collapsing from >65% to <36% as the models learned to predict "no disease" for all inputs. These results demonstrate that neither general-purpose nor medical-specific vision-language pretraining produces features suitable for dense multi-label medical imaging classification. For chest X-ray diagnosis, traditional CNNs with ImageNet pretraining remain substantially more effective than VLM-based approaches.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
37.4%
2
Nature Medicine
117 papers in training set
Top 0.2%
8.3%
3
Scientific Reports
3102 papers in training set
Top 15%
6.7%
50% of probability mass above
4
Nature Communications
4913 papers in training set
Top 29%
6.3%
5
The Lancet Digital Health
25 papers in training set
Top 0.1%
6.3%
6
Patterns
70 papers in training set
Top 0.2%
3.6%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.2%
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 23%
3.0%
9
Nature Computational Science
50 papers in training set
Top 0.4%
2.1%
10
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.5%
1.7%
11
PLOS ONE
4510 papers in training set
Top 55%
1.6%
12
eBioMedicine
130 papers in training set
Top 2%
1.3%
13
Science Translational Medicine
111 papers in training set
Top 3%
1.3%
14
Journal of Medical Imaging
11 papers in training set
Top 0.2%
1.3%
15
Nature Methods
336 papers in training set
Top 5%
0.9%
16
Science Advances
1098 papers in training set
Top 26%
0.9%
17
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.7%
18
Nature
575 papers in training set
Top 16%
0.7%
19
Communications Medicine
85 papers in training set
Top 1%
0.7%
20
Communications Biology
886 papers in training set
Top 27%
0.7%
21
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.6%
22
JAMIA Open
37 papers in training set
Top 2%
0.6%
23
eLife
5422 papers in training set
Top 62%
0.6%
24
Science
429 papers in training set
Top 21%
0.6%
25
Modern Pathology
21 papers in training set
Top 0.6%
0.6%