Back
Top 2%
23.0%
Top 6%
12.2%
Top 2%
8.1%
Top 65%
8.1%
Top 3%
6.7%
Top 1%
4.3%
Top 3%
2.9%
Top 2%
2.9%
Top 1%
2.0%
Top 1%
2.0%
Top 0.8%
2.0%
Top 28%
1.6%
Top 53%
1.6%
Top 5%
1.4%
Top 1%
1.4%
Top 9%
1.4%
Top 3%
1.2%
Top 10%
1.2%
Top 16%
0.7%
Top 10%
0.5%
The NLP-to-Expert Gap in Chest X-ray AI
2026-03-02
radiology and imaging
Title + abstract only
View on medRxiv
Show abstract
In previous work, we achieved state-of-the-art performance on ChestX-ray14 (ROC-AUC 0.940, F1 0.821) using pretraining diversity and clinical metric optimization. Applying the same methodology to CheXpert, we received similar results when using NLP valuation and test data--but when evaluated against expert radiologist labels, performance was only 0.75-0.87 ROC-AUC. The models had learned to match the automated NLP labeling system, not to diagnose disease. This paper documents our investigation ...
Predicted journal destinations
1
Scientific Reports
701 training papers
2
Nature Communications
483 training papers
3
npj Digital Medicine
85 training papers
4
PLOS ONE
1737 training papers
5
PLOS Digital Health
88 training papers
6
Computers in Biology and Medicine
39 training papers
7
Human Brain Mapping
53 training papers
8
NeuroImage
36 training papers
9
eBioMedicine
82 training papers
10
Diagnostics
36 training papers
11
Scientific Data
30 training papers
12
eLife
262 training papers
13
BMJ Open
553 training papers
14
Journal of Biomedical Informatics
37 training papers
15
Imaging Neuroscience
18 training papers
16
PLOS Computational Biology
141 training papers
17
Medicine
29 training papers
18
NeuroImage: Clinical
77 training papers
19
Nature Medicine
88 training papers
20
Communications Medicine
63 training papers