Post-ED Trajectory Prediction in Abdominal Pain with a Generative Medical Event Model
McCann, K. A.; Wright, D. S.; Iscoe, M. S.; Melnick, E. R.; Ohno-Machado, L.; Meeker, D.; Venkatesh, A. K.; Sangal, R. B.; Loza, A. J.
Show abstract
Importance: Abdominal pain causes roughly 10 million US emergency department (ED) visits annually, most resulting in discharge. Post-discharge courses vary, yet existing risk models predict only whether an ED revisit occurs, not what that revisit outcome will entail. Objective: To evaluate whether Curiosity, a generative medical event foundation model, can predict post-ED-discharge trajectories for adults with abdominal pain, differentiating the timing and severity of expected outcomes. Design: Retrospective cohort study; encounters January 1-December 31, 2022; 30-day follow-up; analysis conducted in 2026. Setting: Epic Cosmos research network (multicenter, population-based, de-identified electronic health record). Participants: Adults ([≥]18 years) discharged from the ED with abdominal pain, excluding training-set patients. Random sample of 3,000 drawn from 150,030 eligible patients (65.3% female; median age 47 years [IQR 36-60]). Exposure: ED discharge after evaluation for abdominal pain. Main Outcomes and Measures: Primary: Curiosity model vs. per-task, separately estimated XGBoost models on area under the receiver operating characteristic curve (AUROC) for ED revisit ending in admission (admit-revisit), ED revisit ending in discharge (DC-revisit), and any ED revisit at 72 hours, 7 days, and 30 days. Secondary: trajectory-level accuracy across 36 trajectory classes and edit distance vs XGBoost; calibration of simulated vs observed conditional path probabilities across 45 transitions. Results: Curiosity identified patients at high risk of revisit requiring admission more accurately than XGBoost and differentiated those likely to revisit without admission. Among 3,000 patients, Curiosity's 30-day admit-revisit AUROC was 0.83 (95% CI 0.79-0.87) vs 0.70 (95% CI 0.65-0.75) for XGBoost (DeLong P<.001), and admit-revisit AUC-PR was 0.37 (95% CI 0.29-0.46) against a 4.1% cohort base rate, vs XGBoost 0.13 (95% CI 0.09-0.19). Curiosity identified the most likely trajectory out of 36 possibilities for 45.9% of patients (XGBoost 41.0%; McNemar P<.001), with median edit distance 1.28 vs 1.40 (Wilcoxon P<.001). Median absolute calibration error across 45 transitions was 1.30 percentage points (95% CI 0.32-2.49). Conclusions and Relevance: A generative medical event foundation model produced calibrated trajectory-level predictions and discriminated admit-revisits more effectively than task-specific XGBoost baselines, separating patients that revisited and were admitted from those who revisited and were discharged.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.