Leveraging Expert Knowledge and Causal Structure Learning to Build Parsimonious Models of Acute Brain Dysfunction in the Pediatric Intensive Care Unit
Perez Claudio, E.; Horvat, C.; Au, A. K.; Clark, R. S. B.; Taylor, M. W.; Cooper, G. F.; Li, R.; Nourelahi, M.; Hochheiser, H.
Show abstract
Machine learning adoption in clinical decision support systems remains limited by concerns about transparency and robustness. Causal structure learning (CSL) combined with expert knowledge may address these concerns by identifying potentially causal predictors, enabling more interpretable and clinically aligned models. In this study, we show that by integrating clinician expertise with CSL algorithms we can identify plausible causal drivers of acquired acute brain dysfunction (ABD) in the pediatric intensive care unit (PICU), which enables the development of parsimonious predictive models without substantial loss in performance. To do so, we analyzed 18,568 PICU encounters from the University of Pittsburgh Medical Center Childrens Hospital (2010-2022) and elicited knowledge from experienced clinicians. Encounters with acquired ABD were defined using the validated ABD computable phenotype. Expert knowledge was elicited from four clinicians through iterative interviews to construct a consensus directed acyclic graph (DAG). Clinician consensus achieved acceptable inter-rater reliability (Fleiss Kappa = 0.62) after two rounds of interviews and identified 16 biomarkers as potential causes of acquired ABD. Two CSL algorithms, GOLEM and PC-MB, were applied to enrich the clinicians consensus DAG. The PC-MB algorithm showed 78% concordance with expert consensus, while GOLEM showed 46%. Together, the CSL algorithms identified seven biomarkers as potential causes that were not included in the clinicians DAG: blood urea nitrogen, creatinine, dobutamine, glucose, potassium, PTT, SpO2. Using multiple variations of the enriched DAGs, XGBoost models were trained using biomarkers identified as potential causes of acquired ABD; these were evaluated primarily by area under the precision-recall curve (AUPRC). Models trained on the intersection of clinician consensus and PC-MB DAGs achieved an AUPRC of 0.79 (95% CI: 0.75-0.82) using only 14 biomarkers, compared to 0.81 (95% CI: 0.78-0.84) for the control model using all 45 biomarkers. When restricted to vitals and laboratory results alone, the best-performing model achieved an AUPRC of 0.77. Combining clinical expertise with causal structure learning enables the identification of causal hypotheses consistent with the clinical understanding of the participating clinicians and the development of parsimonious predictive models for acquired ABD in the PICU.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.