Development and Evaluation of a Digital Scribe: Conversation Summarization Pipeline for Emergency Department Counseling Sessions towards Reducing Documentation Burden

Sezgin, E.; Sirrianni, J.; Kranz, K.

2023-12-07 emergency medicine

10.1101/2023.12.06.23299573 medRxiv

Show abstract

ObjectiveWe present a proof-of-concept digital scribe system as an ED clinical conversation summarization pipeline and report its performance. Materials and MethodsWe use four pre-trained large language models to establish the digital scribe system: T5-small, T5-base, PEGASUS-PubMed, and BART-Large-CNN via zero-shot and fine-tuning approaches. Our dataset includes 100 referral conversations among ED clinicians and medical records. We report the ROUGE-1, ROUGE-2, and ROUGE-L to compare model performance. In addition, we annotated transcriptions to assess the quality of generated summaries. ResultsThe fine-tuned BART-Large-CNN model demonstrates greater performance in summarization tasks with the highest ROUGE scores (F1ROUGE-1=0.49, F1ROUGE-2=0.23, F1ROUGE-L=0.35) scores. In contrast, PEGASUS-PubMed lags notably (F1ROUGE-1=0.28, F1ROUGE-2=0.11, F1ROUGE-L=0.22). BART-Large-CNNs performance decreases by more than 50% with the zero-shot approach. Annotations show that BART-Large-CNN performs 71.4% recall in identifying key information and a 67.7% accuracy rate. DiscussionThe BART-Large-CNN model demonstrates a high level of understanding of clinical dialogue structure, indicated by its performance with and without fine-tuning. Despite some instances of high recall, there is variability in the models performance, particularly in achieving consistent correctness, suggesting room for refinement. The models recall ability varies across different information categories. ConclusionThe study provides evidence towards the potential of AI-assisted tools in reducing clinical documentation burden. Future work is suggested on expanding the research scope with larger language models, and comparative analysis to measure documentation efforts and time.

Development and Evaluation of a Digital Scribe: Conversation Summarization Pipeline for Emergency Department Counseling Sessions towards Reducing Documentation Burden

Matching journals