Automated epilepsy and seizure type phenotyping with pre-trained language models
Chang, E.; Xie, K.; Zhou, D.; Korzun, J.; Conrad, E.; Roth, D.; Ellis, C.; Litt, B.
Show abstract
BackgroundEpilepsy is a common neurologic disorder characterized by recurrent, unprovoked seizures. Epilepsy manifests as different seizure types and epilepsy types, which have important implications for treatment and prognosis. Electronic health record systems containing longitudinal data on large epilepsy cohorts can be valuable resources for clinical research. However, detailed epilepsy phenotypes are poorly captured by structured data such as diagnostic codes and are instead buried in unstructured clinical notes. MethodsWe evaluated two transformer-based language models for automated epilepsy and seizure type phenotyping from clinical notes: a fine-tuned BERT model and a large language model, DeepSeek-R1. A subset of notes was annotated by epileptologists, and model performance was benchmarked against expert agreement. The best-performing model was then deployed across all epilepsy progress notes at a large academic medical center to generate patient-level longitudinal epilepsy and seizure phenotypes. ResultsBoth models achieved performance comparable to expert agreement for classifying epilepsy type as focal, generalized, or unspecified (Matthews correlation coefficient [95% CI]: DeepSeek = 0.85 [0.80-0.90], BERT = 0.73 [0.67-0.80], human = 0.77 [0.70-0.83]) and classifying seizure type as convulsive or non-convulsive (DeepSeek = 0.74 [0.66-0.81], BERT = 0.60 [0.49-0.69], human = 0.49 [0.39-0.59]). For more granular classification tasks, DeepSeek maintained performance comparable to expert agreement, whereas BERT performance declined. Deploying DeepSeek-R1 on 77,049 clinical notes from 18,566 patients revealed system-level clinical patterns, including diagnostic stabilization over time, frequent co-occurrence of seizure types, and variation in seizure outcomes by epilepsy type. ConclusionsBy extracting expert-level epilepsy phenotypes from routine clinical text at scale, this approach transforms unstructured EHR data into a resource for longitudinal, population-informed epilepsy care. Automated phenotyping enables analyses of epilepsy trajectories and treatment outcomes that are not feasible with structured data alone, supporting future clinical and translational research applications.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.