Machine Unlearning for GDPR Right-to-Erasure in Antimicrobial Resistance Prediction Models
Saniya, S.; Khan, A. A.
Show abstract
ObjectiveHealthcare machine learning models trained on patient data must comply with the General Data Protection Regulation (GDPR) right-to-erasure requirement, which mandates the removal of individual data contributions from deployed models. Full retraining, the current standard, is computationally expensive. This study evaluates Sharded, Isolated, Sliced and Aggregated (SISA) training as an efficient framework for predicting antimicrobial resistance (AMR). Materials and MethodsSISA training (5 shards) was compared with Full Retraining, Label-Flip Retraining, Influence Reweighting, and Selective Tree Pruning on two datasets: the Antibiotic Resistance Microbiology Dataset (ARMD; n = 1,245,767 EHR records) and the BV-BRC/PATRIC genomic surveillance dataset (n = 400,372). Random Forest classifiers used 500 estimators. Metrics included accuracy, AUC-ROC, membership inference attack (MIA) gap, unlearning time, and cumulative 12-month deletion cost. ResultsSISA achieved an 8.9x speedup over full retraining on ARMD (7.5 s vs. 66.7 s) and a 9.8 x speedup on PATRIC (1.4 s vs. 13.4 s), with accuracy costs of 0.024% and 0.048%, respectively, both below the 0.5% clinical threshold. Label-Flip Retraining and Influence Reweighting provided no speedup ([≤] 1.0 x), while Tree Pruning exceeded the threshold on EHR data (+0.648%). Over 12 months at 50 monthly deletions, SISA reduced cumulative overhead from 800 s to 90 s (ARMD) and from 160 s to 16 s (PATRIC). DiscussionSISA maintains predictive performance while reducing computational cost, supporting machine unlearning for regulatory compliance in clinical ML systems. ConclusionSISA provides an efficient framework for maintaining GDPR-compliant AMR prediction models and support the scalable processing of patients deletion requests.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.