Back

Performance of Vision-Language Models for Zero-Shot Lung Nodule Detection on Chest Radiographs

Nishio, M.; Matsuo, H.; Matsunaga, T.; Fujimoto, K.; Deperrois, N.; Nooralahzadeh, F.; Frauenfelder, T.; Krauthammer, M.; Murakami, T.

2026-06-03 radiology and imaging
10.64898/2026.05.31.26354565 medRxiv
Show abstract

Background and Objectives: The ability of vision-language models (VLMs) to detect lung nodules on chest radiographs remains uncertain. This retrospective study aimed to compare the zero-shot performances of six VLMs for lung nodule detection using data from the Japanese Society of Radiological Technology (JSRT) chest radiograph database. Methods: A total of 247 chest radiographs from the JSRT database (154 with nodules and 93 without) were preprocessed and evaluated using six VLMs: RadVLM, gpt-4o-mini, Qwen3-VL-8B-Instruct, MedGemma-4b-it, LLaVA-Rad, and CheXpert Plus Model. Each model was tested using a zero-shot setting. The text outputs were binarized into nodule-present or nodule-absent labels by consensus between the two radiologists. Sensitivity, specificity, accuracy, precision, and F1 scores were calculated. Pairwise differences in sensitivity, specificity, and accuracy were assessed using McNemar test with Holm correction. Results: The overall performance was limited across all models. RadVLM achieved the highest accuracy (44.5%, 110/247) with perfect specificity (100.0%, 93/93) and precision (100.0%); however, its sensitivity was low (11.0%, 17/154). LLaVA-Rad showed the highest sensitivity (27.3%, 42/154) and F1 score (37.7%), but lower specificity (71.0%, 66/93). MedGemma-4b-it achieved 100.0% specificity, with a sensitivity of only 5.2% (8/154). Grade-specific analysis showed that detection rates were highest for obvious nodules and remained limited for subtle nodules. Pairwise analyses revealed significant differences in sensitivity and specificity for the selected model pairs, particularly between RadVLM and LLaVA-Rad. Conclusion: Current VLMs show limited zero-shot generalizability for lung nodule detection in the JSRT database, with marked trade-offs between sensitivity and specificity. Their near-term value may lie more in radiologist-assisted workflows than in stand-alone detection. Clinical Impact: Current VLMs should not be used as stand-alone tools for lung nodule detection on chest radiographs because of their limited sensitivity and substantial model-dependent trade-offs. However, their high-specificity outputs in some models and higher-sensitivity behavior in others suggest potential roles in radiologist-assisted workflows, such as report drafting and second-reader support.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
European Radiology
14 papers in training set
Top 0.1%
23.4%
2
PLOS ONE
4510 papers in training set
Top 17%
10.5%
3
Scientific Reports
3102 papers in training set
Top 8%
8.7%
4
Diagnostics
48 papers in training set
Top 0.2%
5.0%
5
Medical Physics
14 papers in training set
Top 0.1%
5.0%
50% of probability mass above
6
PLOS Digital Health
91 papers in training set
Top 0.7%
3.7%
7
Annals of Translational Medicine
17 papers in training set
Top 0.4%
3.0%
8
PLOS Computational Biology
1633 papers in training set
Top 11%
2.8%
9
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.3%
2.7%
10
Frontiers in Oncology
95 papers in training set
Top 2%
2.5%
11
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.5%
12
The Lancet Digital Health
25 papers in training set
Top 0.2%
2.2%
13
IEEE Access
31 papers in training set
Top 0.3%
2.0%
14
JAMA Network Open
127 papers in training set
Top 2%
1.8%
15
Frontiers in Medicine
113 papers in training set
Top 4%
1.4%
16
GigaScience
172 papers in training set
Top 2%
0.9%
17
Heliyon
146 papers in training set
Top 4%
0.9%
18
Journal of Medical Imaging
11 papers in training set
Top 0.2%
0.9%
19
PeerJ
261 papers in training set
Top 12%
0.9%
20
npj Precision Oncology
48 papers in training set
Top 1%
0.9%
21
npj Digital Medicine
97 papers in training set
Top 3%
0.8%
22
Archives of Clinical and Biomedical Research
28 papers in training set
Top 2%
0.8%
23
Physics in Medicine & Biology
17 papers in training set
Top 0.5%
0.7%
24
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1.0%
0.7%
25
Medicine
30 papers in training set
Top 2%
0.7%
26
eBioMedicine
130 papers in training set
Top 4%
0.7%
27
iScience
1063 papers in training set
Top 39%
0.5%
28
European Journal of Nuclear Medicine and Molecular Imaging
19 papers in training set
Top 0.4%
0.5%