Clinician-Centered Evaluation of Large Language Model-Generated Discharge Summaries for Longer Hospitalizations: Insights from Hospitalists and Primary Care Physicians
Osborne, T.; Mahmud, T.; Zheng, X.; Jampala, S.; Abbasi, S.; Hong, S.; Kranz, K.; Lee, S.; Ng, P.; Odekon, K.; Schachter, L.; Sexton, R.; Spinnato, T.; Tharakan, M.; Wu, Z.; Wang, F.; Wong, R.
Show abstract
Although large language models (LLMs) have shown promise for discharge summary generation, their value may be greater in longer hospitalizations, where increasing documentation volume and complexity increase both clinician burden and the risk of communication failures during transitions of care. Prior evaluations of LLM-generated discharge summaries have largely involved shorter stays and have rarely examined receiving-clinician priorities or incidental finding reporting. We compared LLM-generated and human-authored discharge summaries for 60 Internal Medicine hospitalizations lasting 7 to 21 days, with paired assessment by hospitalists and primary care physicians (PCPs). Clinician reviewers preferred LLM-generated summaries for 95% of encounters and rated them higher for quality, readability, factuality and completeness. PCPs, the primary recipients responsible for post-discharge care, found that LLM-generated summaries were better for understanding and communicating hospital care to patients, and providing follow-up care. LLM-generated summaries had fewer annotated errors, primarily due to fewer omissions, without increased estimated harm potential or likelihood compared with human-authored summaries. Benefits of LLM-generated summaries were especially salient for PCPs, who identified more omissions with greater downstream likelihood of harm than hospitalists. This underscores the importance of designing transition documents around the needs of clinicians assuming care post-discharge. LLM identification of radiology incidental findings was generally accurate and appropriate, suggesting potential to improve follow-up of clinically relevant findings. These findings extend prior work by demonstrating clinical value of LLMs in summarizing longer, complex hospitalizations and highlighting the value of stakeholder-centered design in clinical AI systems. Together, they support supervised LLM-assisted discharge summarization as a tool to reduce cognitive burden, improve documentation quality, and enhance transition-of-care communication.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.