OCR-Mediated Modality Dominance in Vision-Language Models: Implications for Radiology AI Trustworthiness

Akbasli, I. T.; Ozturk, B.; Serin, O.; Dogan, V.; Berikol, G. B.; Comeau, D. S.; Celi, L. A.; Ozguner, O.

2026-02-24 health informatics

10.64898/2026.02.22.26346828 medRxiv

Show abstract

1.BackgroundVision-language models (VLMs) are increasingly proposed for radiologic decision support, yet the security implications of deploying general-domain, OCR-capable models in diagnostic workflows remain poorly characterized. When image-embedded text is not treated as untrusted input, the visual channel becomes vulnerable to adversarial manipulation through OCR-readable overlays. MethodsNine commercial VLMs, none intended or validated for clinical diagnosis, were evaluated on 600 brain MRI studies (300 tumor-positive, 300 tumor-negative) for binary tumor detection across four conditions: clean input, visible radiology-report injection, human-imperceptible stealth OCR injection, and a multi-stage immune-prompt defense combining both attack types with enforced visual-priority reasoning. Approximately 27,000 inference calls were analyzed. Primary outcomes included accuracy, attack success rate (ASR), false positive rate (FPR), and masking rate. ResultsAt baseline, performance was heterogeneous (median accuracy 0.69, sensitivity 0.79, specificity 0.59). Visible injection caused universal specificity collapse (0.00 across all models; FPR 1.00), with a median ASR of 0.97; every model unconditionally privileged the injected text over its own visual analysis. Stealth injection, despite being imperceptible to human reviewers, still drove substantial degradation (median accuracy 0.43; ASR 0.57; FPR 0.84). Immune prompting achieved only partial and inconsistent mitigation: under stealth injection, median ASR decreased to 0.44, and accuracy improved to 0.56, yet residual overcalling persisted (median FPR 0.67), and three models maintained an FPR of 1.00. ConclusionsCommercial VLMs exhibit a deployment-critical failure mode in radiology-like scenarios: OCR-readable text embedded in images can dominate the decision pathway and override pixel-level evidence, even under stealth conditions that evade human inspection. Prompt-level defenses provide insufficient protection. These findings underscore that any clinical integration of VLMs must be gated by system-level safeguards, including OCR-aware input handling, provenance controls, and enforced human verification, before such tools can be considered for safety-sensitive environments.

OCR-Mediated Modality Dominance in Vision-Language Models: Implications for Radiology AI Trustworthiness

Matching journals