Trustworthy personalized treatment selection: causal effect-trees and calibration in perioperative medicine

Mittelberg, Y.; Stiglitz, D. K.; Kowadlo, G.

2026-03-04 health informatics

10.64898/2026.03.03.26347440 medRxiv

Show abstract

BackgroundPersonalized medicine promises to tailor treatments to the individual, but it carries a hidden risk: mistaking statistical noise for actionable clinical insight. Current machine learning approaches often provide predictions, but fail to inform clinicians when those predictions are unreliable. ObjectiveDevelop a deployment-readiness framework that integrates causal inference, interpretable effect-trees, and calibration assessment to distinguish actionable signal from unreliable variation, and to support treatment selection only when the estimated benefit is both reliable and clinically meaningful. MethodsUsing retrospective observational cohort EHR data from the INSPIRE perioperative dataset (N>130,000 surgical operations, 2011-2020), we estimated treatment effects using causal forests with double machine learning, benchmarked against other causal methods to assess convergence. We used the estimated causal effects to create effect-trees and translated estimates into interpretable rules. We validated the treatment recommendations by assessing subgroup calibration to identify which groups were reliable for treatment selection. ResultsIn a prostate procedures case study (neuraxial versus general anesthesia; total N=2,822), neuraxial anesthesia was associated with substantially lower post-operative opioid use (ATE = -1.38 opioid medications, 95% CI [-1.62, -1.15]). The effect-tree produced five clinically interpretable subgroups using BMI, ASA status, and age, with effects ranging from -1.10 to -1.59 opioid medications. Calibration analysis identified four of five subgroups as reliable for deployment (calibration error < 0.08), while one small subgroup (N=250) showed higher calibration error (0.44), illustrating how the framework rates unreliable heterogeneity. ConclusionsIndividual prediction heterogeneity does not automatically justify clinical personalization. By combining effect-trees with calibration, this framework distinguishes actionable heterogeneity from noisy heterogeneity (detectable but unreliable). This approach transforms causal machine learning from a black box into a validated decision support system that enables selective deployment of treatment decision rules.

Trustworthy personalized treatment selection: causal effect-trees and calibration in perioperative medicine

Matching journals