Identification of Risk Factors for Glaucoma Progression in Free-Text Clinical Notes using a Local Small Language Model
Bhatnagar, A.; Scherer, R.; Samico, G. A.; Muralidhar, R.; Gutkind, N. E.; Palazoni, V.; Medeiros, F. A.; Swaminathan, S. S.
Show abstract
PurposeTo evaluate the performance of a large language model (LLM) in identifying medication non-adherence, visit non-adherence, and family history of glaucoma (FHoG) in clinical notes from the electronic health record (EHR). MethodsWe extracted clinical notes of 1,250 glaucoma-related encounters between 2014 and 2024 and structured EHR family history field data from the Bascom Palmer Ophthalmic Repository, with 125 randomly selected notes (10%) used for prompt development and excluded from analysis. Two fellowship-trained glaucoma specialists labeled notes for evidence of non-adherence and FHoG. We utilized MedGemma-27B-text-it, a specialized medical LLM, to identify medication non-adherence, visit non-adherence, and FHoG. We calculated accuracy, sensitivity, and specificity of LLM performance for each task, Jaccard index for FHoG, and mean squared error (MSE) of number of family members with glaucoma. ResultsPrevalence of medication non-adherence, visit non-adherence, and FHoG were 7.3%, 4.7%, and 29.2%, respectively. LLM accuracy was 0.91 (sensitivity: 0.96; specificity: 0.91) for medication non-adherence and 0.96 (sensitivity: 0.97; specificity: 0.94) for visit non-adherence. For FHoG, LLM accuracy was 0.98 (sensitivity: 0.99; specificity: 0.99) with Jaccard index of 0.99, while EHR family history field accuracy and Jaccard index were 0.49 and 0.75, respectively. LLM and EHR MSE in quantifying the number of relatives with glaucoma were 0.05{+/-}0.56 and 0.85{+/-}1.80, respectively (p<0.001). ConclusionsLLMs identified non-adherence to medication and visit schedules as well as degree of FHoG in clinical notes with high accuracy. Translational RelevanceLocal LLM pipelines can enable large-scale research into glaucoma risk factors that are unavailable in discrete EHR fields.