Extraction of Crohn's Disease Clinical Phenotypes from Clinical Text Using Natural Language Processing
Schmidt, L.; Ibing, S.; Borchert, F.; Hugo, J.; Marshall, A.; Peraza, J.; Cho, J. H.; Bottinger, E. P.; Ungaro, R. C.
Show abstract
Real-world studies based on electronic health records often require manual chart review to derive patients clinical phenotypes, a labor-intensive task with limited scalability. Here, we developed and compared computable phenotyping based on rules using the spaCy frame-work and a Large Language Model (LLM), GPT-4, for disease behavior and age at diagnosis of Crohns disease patients. We are the first to describe computable phenotyping algorithms using clinical texts for these complex tasks with previously described inter-annotator agreements between 0.54 and 0.98. The data comprised clinical notes and radiology reports from 584 Mount Sinai Health System patients. Overall, we observed similar or better performance using GPT-4 compared to the rules. On a note-level, the F1 score was at least 0.90 for disease behavior and 0.82 for age at diagnosis. We could not find statistical evidence for a difference to the performance of human experts on this task. Our findings underline the potential of LLMs for computable phenotyping. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=57 SRC="FIGDIR/small/23297099v2_ufig1.gif" ALT="Figure 1"> View larger version (20K): org.highwire.dtl.DTLVardef@20c846org.highwire.dtl.DTLVardef@3c92b5org.highwire.dtl.DTLVardef@c3e8cborg.highwire.dtl.DTLVardef@1e89f36_HPS_FORMAT_FIGEXP M_FIG C_FIG
Matching journals
The top 9 journals account for 50% of the predicted probability mass.