SPELL-LLMs: A Scalable and Privacy-Compliant NLP Pipeline Using Locally Hosted Large Language Models for Clinical Information Extraction

Kleinlein, R.; Gray, K. J.; Bates, D.; Kovacheva, V. P.

2025-07-25 health informatics

10.1101/2025.07.25.25332130 medRxiv

Show abstract

ObjectiveElectronic health records (EHRs) contain valuable information for clinical research and decision-making. However, leveraging these data remains challenging due to data heterogeneity, inconsistent documentation, missing information, and evolving terminology, especially within unstructured clinical notes. We developed SPELL (Snippet-Primed rEgex LLM Pipeline), a scalable natural language processing (NLP) workflow to systematically extract structured clinical insights from large volumes of clinical narratives. Materials and MethodsOur platform employs a hybrid approach combining regular expressions (regex) to rapidly identify relevant textual snippets with locally hosted large language models (LLMs) for accurate clinical interpretation. All data processing occurs securely within institutional computational environments. The modular Python-based workflow facilitates adaptation across institutions and is optimized for computational efficiency, supporting high-throughput processing even in resource-limited settings. We quantified computational scalability (elapsed time, out-of-memory events, GPU temperature, and energy consumed) and audited retrieval recall using clinician-annotated regex-negative notes enriched with relevant structured metadata. ResultsThe pipeline efficiently processed 31 million clinical reports spanning 1976-2024 from eight affiliated hospitals. By analyzing targeted snippets rather than entire documents, our approach reduced processing time by 68% compared to traditional full-document LLM inference, and by >95% compared to manual physician annotation. Accuracy was rigorously validated across three obstetric tasks: extraction of numerical values (blood loss volumes), dates (estimated due dates), and diagnoses (hemolysis, elevated liver enzymes, and low platelets [HELLP] syndrome). Task-level performance included 94-98% exact-match accuracy for the three concepts on curated snippets. Generalizability was investigated using the publicly available MT Samples corpus (5,013 notes, 40 specialties), yielding 84% accuracy for ventricular tachycardia detection with markedly fewer false positives. Discussion and ConclusionsA hybrid regex[->]snippet[->]LLM approach delivers accurate, privacy-preserving, and computationally efficient extraction for unstructured EHR data. By focusing inference on snippets and deploying local, open-weights models, SPELL aligns with institutional data governance requirements while enabling scalable clinical informatics studies across diverse extraction tasks. Summary StatementWe developed SPELL, a scalable NLP pipeline combining regex and locally hosted LLMs for efficient information extraction from clinical narratives.

SPELL-LLMs: A Scalable and Privacy-Compliant NLP Pipeline Using Locally Hosted Large Language Models for Clinical Information Extraction

Matching journals