Pathology's Last Exam: Stress-Testing Diagnostic Reasoning and Safety in Large Language Models

2025-12-15 pathology Title + abstract only

Show abstract

Large language models (LLMs) are evolving into diagnostic co-pilots, yet current benchmarks fail to test the integrated, stepwise reasoning required in diagnostic pathology. Here, we present Pathologys Last Exam (PLE), a curated, highly detailed, text-based benchmark of 100 complex cases spanning organ systems, enriched for rare/challenging entities, plus 20 adversarial cases designed to stress-test model safety. Each case provides structured blocks (Primary, Clinical, Histopathology, IHC/Specia...

Pathology's Last Exam: Stress-Testing Diagnostic Reasoning and Safety in Large Language Models

Predicted journal destinations