Why Large Language Models' Clinical Reasoning Fails: Insights from Explainable Deep Learning

2026-01-27 health informatics Title + abstract only

Show abstract

Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations of physician-validated oncology cases and trained sparse autoencoders on 1 billion tokens from 50,000 MIMIC-IV clinical notes to decompose their internal representation. We find models...

Why Large Language Models' Clinical Reasoning Fails: Insights from Explainable Deep Learning

Predicted journal destinations