Back

A Comparison of Manual and Automated Approaches to Developing Computable Algorithms for Identifying Acute Pancreatitis

Bann, M. A.; Carrell, D. S.; Gruber, S.; Heagerty, P. J.; Williamson, B. D.; Nelson, J. C.; Hazlehurst, B.; Felcher, A.; Nyongesa, D. B.; Slaughter, M. T.; Sapp, D. S.; Cronkite, D. J.; Ball, R.; Floyd, J. S.

2026-06-08 health informatics
10.64898/2026.06.05.26354934 medRxiv
Show abstract

Objective: Clinical phenotyping methods that rely on clinical and informatics expertise can be time-intensive and costly. We tested both manual and highly automated approaches using electronic health record (EHR) data to identify an FDA Sentinel Initiative health outcome of interest, acute pancreatitis. Materials and Methods: We trained and evaluated machine learning algorithms using EHR data with two approaches: a custom approach that included manually curated features and trained on outcomes data validated with medical record review, and a highly automated approach that greatly simplifies and automates feature engineering and relies on low-cost silver-standard outcomes for model training. Results: Custom algorithms using manually curated structured claims data discriminated cases from non-cases with a high degree of accuracy (cv-AUC 0.89 [95%CI 0.84-0.94]); the inclusion of natural language processing (NLP)-derived covariates from clinical notes increased performance slightly (cv-AUC 0.91[95%CI 0.86-0.97]). The automated algorithm trained on the outcome count of diagnosis codes performed less well (AUC 0.80 [95% CI 0.75-0.85]) but improved using maximum lipase value as an outcome (AUC 0.88 [95% CI 0.84-0.92]). At a positive predictive value of 90%, the custom algorithm had a sensitivity of 92%, the automated algorithm trained on diagnosis code count had a sensitivity of 45%, and the automated algorithm trained on maximum lipase value had a sensitivity of 84%. However, a prediction rule derived by clinicians during chart review was nearly as accurate (maximum lipase value [≥] 3 times upper limit of normal; AUC 0.86, PPV 85%, sensitivity 92%). Discussion: Machine learning algorithms with manually curated structured data and NLP features trained on validated outcomes data successfully identified validated events. Use of an outcome in the automated model based on specific phenotype knowledge (maximum lipase value) allowed for performance similar to the custom model and with considerably less resources.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
International Journal of Medical Informatics
25 papers in training set
Top 0.1%
10.1%
2
Scientific Reports
3102 papers in training set
Top 9%
8.4%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
8.4%
4
JAMIA Open
37 papers in training set
Top 0.2%
6.4%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.7%
3.7%
6
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.6%
7
JMIR Medical Informatics
17 papers in training set
Top 0.3%
3.6%
8
PLOS Digital Health
91 papers in training set
Top 0.7%
3.6%
9
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
50% of probability mass above
10
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.3%
11
PLOS ONE
4510 papers in training set
Top 43%
2.7%
12
The Lancet Digital Health
25 papers in training set
Top 0.2%
2.7%
13
Annals of Neurology
57 papers in training set
Top 0.8%
2.6%
14
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.8%
15
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.5%
1.7%
16
Cancer Medicine
24 papers in training set
Top 0.8%
1.5%
17
Frontiers in Digital Health
20 papers in training set
Top 0.8%
1.5%
18
Clinical and Translational Science
21 papers in training set
Top 0.6%
1.3%
19
Diagnostics
48 papers in training set
Top 1%
1.3%
20
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.5%
1.0%
21
Annals of Internal Medicine
27 papers in training set
Top 0.8%
0.9%
22
Genetics in Medicine
69 papers in training set
Top 0.9%
0.8%
23
Med
38 papers in training set
Top 0.7%
0.8%
24
Frontiers in Neurology
91 papers in training set
Top 5%
0.8%
25
Diabetologia
36 papers in training set
Top 0.9%
0.7%
26
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.8%
0.7%
27
Open Forum Infectious Diseases
134 papers in training set
Top 3%
0.7%
28
Hepatology Communications
21 papers in training set
Top 0.3%
0.7%
29
BMC Medicine
163 papers in training set
Top 7%
0.7%
30
Journal of Pathology Informatics
13 papers in training set
Top 0.4%
0.7%