Back

OCR-Mediated Modality Dominance in Vision-Language Models: Implications for Radiology AI Trustworthiness

Akbasli, I. T.; Ozturk, B.; Serin, O.; Dogan, V.; Berikol, G. B.; Comeau, D. S.; Celi, L. A.; Ozguner, O.

2026-02-24 health informatics
10.64898/2026.02.22.26346828 medRxiv
Show abstract

1.BackgroundVision-language models (VLMs) are increasingly proposed for radiologic decision support, yet the security implications of deploying general-domain, OCR-capable models in diagnostic workflows remain poorly characterized. When image-embedded text is not treated as untrusted input, the visual channel becomes vulnerable to adversarial manipulation through OCR-readable overlays. MethodsNine commercial VLMs, none intended or validated for clinical diagnosis, were evaluated on 600 brain MRI studies (300 tumor-positive, 300 tumor-negative) for binary tumor detection across four conditions: clean input, visible radiology-report injection, human-imperceptible stealth OCR injection, and a multi-stage immune-prompt defense combining both attack types with enforced visual-priority reasoning. Approximately 27,000 inference calls were analyzed. Primary outcomes included accuracy, attack success rate (ASR), false positive rate (FPR), and masking rate. ResultsAt baseline, performance was heterogeneous (median accuracy 0.69, sensitivity 0.79, specificity 0.59). Visible injection caused universal specificity collapse (0.00 across all models; FPR 1.00), with a median ASR of 0.97; every model unconditionally privileged the injected text over its own visual analysis. Stealth injection, despite being imperceptible to human reviewers, still drove substantial degradation (median accuracy 0.43; ASR 0.57; FPR 0.84). Immune prompting achieved only partial and inconsistent mitigation: under stealth injection, median ASR decreased to 0.44, and accuracy improved to 0.56, yet residual overcalling persisted (median FPR 0.67), and three models maintained an FPR of 1.00. ConclusionsCommercial VLMs exhibit a deployment-critical failure mode in radiology-like scenarios: OCR-readable text embedded in images can dominate the decision pathway and override pixel-level evidence, even under stealth conditions that evade human inspection. Prompt-level defenses provide insufficient protection. These findings underscore that any clinical integration of VLMs must be gated by system-level safeguards, including OCR-aware input handling, provenance controls, and enforced human verification, before such tools can be considered for safety-sensitive environments.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
41.1%
2
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.2%
4.5%
3
Scientific Reports
3102 papers in training set
Top 29%
4.1%
4
BMJ Health & Care Informatics
13 papers in training set
Top 0.1%
3.7%
50% of probability mass above
5
Annals of Internal Medicine
27 papers in training set
Top 0.1%
3.7%
6
PLOS ONE
4510 papers in training set
Top 46%
2.5%
7
PLOS Digital Health
91 papers in training set
Top 1%
2.0%
8
The Lancet Digital Health
25 papers in training set
Top 0.3%
2.0%
9
BMJ Open
554 papers in training set
Top 9%
1.7%
10
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.4%
11
Nature Communications
4913 papers in training set
Top 54%
1.4%
12
Nature Medicine
117 papers in training set
Top 3%
1.3%
13
Patterns
70 papers in training set
Top 1%
1.3%
14
NeuroImage: Clinical
132 papers in training set
Top 3%
1.3%
15
Frontiers in Medicine
113 papers in training set
Top 4%
1.3%
16
PLOS Computational Biology
1633 papers in training set
Top 21%
1.0%
17
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
0.9%
18
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
19
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.8%
20
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
21
JAMA Network Open
127 papers in training set
Top 4%
0.8%
22
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.8%
23
European Radiology
14 papers in training set
Top 0.7%
0.8%
24
eBioMedicine
130 papers in training set
Top 4%
0.8%
25
GigaScience
172 papers in training set
Top 3%
0.7%
26
IEEE Access
31 papers in training set
Top 1%
0.7%
27
JAMIA Open
37 papers in training set
Top 2%
0.7%
28
Med
38 papers in training set
Top 1%
0.5%
29
JAMA
17 papers in training set
Top 0.5%
0.5%
30
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.5%