Back

Application of concise machine learning to construct accurate and interpretable EHR computable phenotypes

La Cava, W.; Lee, P. C.; Ajmal, I.; Ding, X.; Cohen, J. B.; Solanki, P.; Moore, J. H.; Herman, D. S.

2020-12-14 health informatics
10.1101/2020.12.12.20248005 medRxiv
Show abstract

ObjectiveElectronic health records (EHRs) can improve patient care by enabling systematic identification of patients for targeted decision support. But, this requires scalable learning of computable phenotypes. To this end, we developed the feature engineering automation tool (FEAT) and assessed it in targeting screening for the underdiagnosed, under-treated disease primary aldosteronism. Materials and MethodsWe selected 1,199 subjects receiving longitudinal care in a large health system and classified them for hypertension (N=608), hypertension with unexplained hypokalemia (N=172), and apparent treatment-resistant hypertension (N=176) by chart review. We derived 331 features from EHR encounters, diagnoses, laboratories, medications, vitals, and notes. We modified FEAT to encourage model parsimony and compared its models performance and interpretability to those of expert-curated heuristics and conventional machine learning. ResultsFEAT models trained to replicate expert-curated heuristics had higher area under the precision-recall curve (AUPRC) than all other models (p < 0.001) except random forests and were smaller than all other models (p < 1e-6) except decision trees. FEAT models trained to predict chart review phenotypes exhibited similar AUPRC to penalized logistic regression while being simpler than all other models (p < 1e-6). For treatment-resistant hypertension, FEAT learned a six-feature, clinically intuitive model that demonstrated a positive predictive value of 0.70 and sensitivity of 0.62 in held-out testing data. DiscussionFEAT learns computable phenotypes that approach the performance of expert-curated heuristics and conventional machine learning without sacrificing interpretability. ConclusionBy constructing accurate and interpretable computable phenotypes at scale, FEAT has the potential to facilitate systematic clinical decision support.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
47.4%
2
npj Digital Medicine
97 papers in training set
Top 0.4%
12.2%
50% of probability mass above
3
JAMIA Open
37 papers in training set
Top 0.2%
6.3%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.9%
3.0%
5
Journal of Biomedical Informatics
45 papers in training set
Top 0.6%
2.7%
6
European Heart Journal - Digital Health
15 papers in training set
Top 0.2%
2.3%
7
JMIR Medical Informatics
17 papers in training set
Top 0.6%
1.9%
8
Circulation
66 papers in training set
Top 2%
1.7%
9
The Lancet Digital Health
25 papers in training set
Top 0.5%
1.5%
10
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
11
Scientific Reports
3102 papers in training set
Top 64%
1.3%
12
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.5%
1.2%
13
Frontiers in Digital Health
20 papers in training set
Top 1.0%
1.1%
14
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
1.1%
15
Critical Care Explorations
15 papers in training set
Top 0.4%
0.9%
16
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
17
JAMA Network Open
127 papers in training set
Top 4%
0.8%
18
BMC Cardiovascular Disorders
14 papers in training set
Top 1%
0.8%
19
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 1%
0.7%
20
Journal of Clinical Medicine
91 papers in training set
Top 7%
0.7%
21
eBioMedicine
130 papers in training set
Top 5%
0.7%
22
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
23
Open Heart
19 papers in training set
Top 1%
0.7%