Application of concise machine learning to construct accurate and interpretable EHR computable phenotypes
La Cava, W.; Lee, P. C.; Ajmal, I.; Ding, X.; Cohen, J. B.; Solanki, P.; Moore, J. H.; Herman, D. S.
Show abstract
ObjectiveElectronic health records (EHRs) can improve patient care by enabling systematic identification of patients for targeted decision support. But, this requires scalable learning of computable phenotypes. To this end, we developed the feature engineering automation tool (FEAT) and assessed it in targeting screening for the underdiagnosed, under-treated disease primary aldosteronism. Materials and MethodsWe selected 1,199 subjects receiving longitudinal care in a large health system and classified them for hypertension (N=608), hypertension with unexplained hypokalemia (N=172), and apparent treatment-resistant hypertension (N=176) by chart review. We derived 331 features from EHR encounters, diagnoses, laboratories, medications, vitals, and notes. We modified FEAT to encourage model parsimony and compared its models performance and interpretability to those of expert-curated heuristics and conventional machine learning. ResultsFEAT models trained to replicate expert-curated heuristics had higher area under the precision-recall curve (AUPRC) than all other models (p < 0.001) except random forests and were smaller than all other models (p < 1e-6) except decision trees. FEAT models trained to predict chart review phenotypes exhibited similar AUPRC to penalized logistic regression while being simpler than all other models (p < 1e-6). For treatment-resistant hypertension, FEAT learned a six-feature, clinically intuitive model that demonstrated a positive predictive value of 0.70 and sensitivity of 0.62 in held-out testing data. DiscussionFEAT learns computable phenotypes that approach the performance of expert-curated heuristics and conventional machine learning without sacrificing interpretability. ConclusionBy constructing accurate and interpretable computable phenotypes at scale, FEAT has the potential to facilitate systematic clinical decision support.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.