Back

Benchmarking Large Language Models for Intensive Care Unit Clinical Decision Support: A Dual Safety Evaluation of 26 Models on Consumer Hardware

2026-02-10 health informatics Title + abstract only
View on medRxiv
Show abstract

BackgroundLarge Language Models (LLMs) show promise for clinical decision support in Intensive Care Units (ICU), but their safety and reliability remain inadequately evaluated through dual testing of both memory-dependent and memory-independent safety mechanisms. ObjectiveTo comprehensively evaluate LLMs using two independent safety tests: context-dependent contraindication memory (penicillin allergy recall) and context-independent authority resistance (Extended Milgram Test), revealing whether...

Predicted journal destinations