Back

Trustworthy personalized treatment selection: causal effect-trees and calibration in perioperative medicine

Mittelberg, Y.; Stiglitz, D. K.; Kowadlo, G.

2026-03-04 health informatics
10.64898/2026.03.03.26347440 medRxiv
Show abstract

BackgroundPersonalized medicine promises to tailor treatments to the individual, but it carries a hidden risk: mistaking statistical noise for actionable clinical insight. Current machine learning approaches often provide predictions, but fail to inform clinicians when those predictions are unreliable. ObjectiveDevelop a deployment-readiness framework that integrates causal inference, interpretable effect-trees, and calibration assessment to distinguish actionable signal from unreliable variation, and to support treatment selection only when the estimated benefit is both reliable and clinically meaningful. MethodsUsing retrospective observational cohort EHR data from the INSPIRE perioperative dataset (N>130,000 surgical operations, 2011-2020), we estimated treatment effects using causal forests with double machine learning, benchmarked against other causal methods to assess convergence. We used the estimated causal effects to create effect-trees and translated estimates into interpretable rules. We validated the treatment recommendations by assessing subgroup calibration to identify which groups were reliable for treatment selection. ResultsIn a prostate procedures case study (neuraxial versus general anesthesia; total N=2,822), neuraxial anesthesia was associated with substantially lower post-operative opioid use (ATE = -1.38 opioid medications, 95% CI [-1.62, -1.15]). The effect-tree produced five clinically interpretable subgroups using BMI, ASA status, and age, with effects ranging from -1.10 to -1.59 opioid medications. Calibration analysis identified four of five subgroups as reliable for deployment (calibration error < 0.08), while one small subgroup (N=250) showed higher calibration error (0.44), illustrating how the framework rates unreliable heterogeneity. ConclusionsIndividual prediction heterogeneity does not automatically justify clinical personalization. By combining effect-trees with calibration, this framework distinguishes actionable heterogeneity from noisy heterogeneity (detectable but unreliable). This approach transforms causal machine learning from a black box into a validated decision support system that enables selective deployment of treatment decision rules.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
23.5%
2
npj Digital Medicine
97 papers in training set
Top 0.3%
15.4%
3
Nature Communications
4913 papers in training set
Top 22%
8.6%
4
The Lancet Digital Health
25 papers in training set
Top 0.1%
4.1%
50% of probability mass above
5
Scientific Reports
3102 papers in training set
Top 33%
3.7%
6
PLOS ONE
4510 papers in training set
Top 42%
3.2%
7
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.3%
2.5%
8
JMIR Medical Informatics
17 papers in training set
Top 0.5%
2.2%
9
Science Advances
1098 papers in training set
Top 12%
2.2%
10
JAMIA Open
37 papers in training set
Top 0.9%
1.6%
11
Annals of Internal Medicine
27 papers in training set
Top 0.5%
1.4%
12
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.4%
13
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.4%
14
BMC Medicine
163 papers in training set
Top 5%
1.3%
15
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.3%
16
JAMA Network Open
127 papers in training set
Top 3%
1.2%
17
Communications Medicine
85 papers in training set
Top 0.5%
1.2%
18
British Journal of Anaesthesia
14 papers in training set
Top 0.6%
0.9%
19
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.8%
20
Frontiers in Medicine
113 papers in training set
Top 6%
0.8%
21
iScience
1063 papers in training set
Top 30%
0.8%
22
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.8%
23
PLOS Biology
408 papers in training set
Top 19%
0.8%
24
Cancer Cell
38 papers in training set
Top 2%
0.8%
25
Nature Human Behaviour
85 papers in training set
Top 4%
0.8%
26
Inflammatory Bowel Diseases
15 papers in training set
Top 0.3%
0.8%
27
European Journal of Cancer
10 papers in training set
Top 0.5%
0.8%
28
Frontiers in Digital Health
20 papers in training set
Top 2%
0.7%
29
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
30
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.9%
0.7%