Back

Machine Unlearning for GDPR Right-to-Erasure in Antimicrobial Resistance Prediction Models

Saniya, S.; Khan, A. A.

2026-03-10 health informatics
10.64898/2026.03.09.26347960 medRxiv
Show abstract

ObjectiveHealthcare machine learning models trained on patient data must comply with the General Data Protection Regulation (GDPR) right-to-erasure requirement, which mandates the removal of individual data contributions from deployed models. Full retraining, the current standard, is computationally expensive. This study evaluates Sharded, Isolated, Sliced and Aggregated (SISA) training as an efficient framework for predicting antimicrobial resistance (AMR). Materials and MethodsSISA training (5 shards) was compared with Full Retraining, Label-Flip Retraining, Influence Reweighting, and Selective Tree Pruning on two datasets: the Antibiotic Resistance Microbiology Dataset (ARMD; n = 1,245,767 EHR records) and the BV-BRC/PATRIC genomic surveillance dataset (n = 400,372). Random Forest classifiers used 500 estimators. Metrics included accuracy, AUC-ROC, membership inference attack (MIA) gap, unlearning time, and cumulative 12-month deletion cost. ResultsSISA achieved an 8.9x speedup over full retraining on ARMD (7.5 s vs. 66.7 s) and a 9.8 x speedup on PATRIC (1.4 s vs. 13.4 s), with accuracy costs of 0.024% and 0.048%, respectively, both below the 0.5% clinical threshold. Label-Flip Retraining and Influence Reweighting provided no speedup ([≤] 1.0 x), while Tree Pruning exceeded the threshold on EHR data (+0.648%). Over 12 months at 50 monthly deletions, SISA reduced cumulative overhead from 800 s to 90 s (ARMD) and from 160 s to 16 s (PATRIC). DiscussionSISA maintains predictive performance while reducing computational cost, supporting machine unlearning for regulatory compliance in clinical ML systems. ConclusionSISA provides an efficient framework for maintaining GDPR-compliant AMR prediction models and support the scalable processing of patients deletion requests.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
International Journal of Medical Informatics
25 papers in training set
Top 0.1%
9.7%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
9.7%
3
Scientific Reports
3102 papers in training set
Top 11%
8.0%
4
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
6.9%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.4%
6.5%
6
Journal of Medical Internet Research
85 papers in training set
Top 0.7%
6.5%
7
PLOS Digital Health
91 papers in training set
Top 0.4%
6.1%
50% of probability mass above
8
Frontiers in Digital Health
20 papers in training set
Top 0.1%
6.0%
9
npj Digital Medicine
97 papers in training set
Top 1%
4.1%
10
JMIR Medical Informatics
17 papers in training set
Top 0.4%
3.4%
11
PLOS ONE
4510 papers in training set
Top 41%
3.4%
12
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
2.9%
13
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
14
Artificial Intelligence in Medicine
15 papers in training set
Top 0.3%
1.6%
15
The Lancet Digital Health
25 papers in training set
Top 0.5%
1.4%
16
JAMIA Open
37 papers in training set
Top 1%
1.3%
17
GigaScience
172 papers in training set
Top 2%
1.3%
18
European Respiratory Journal
54 papers in training set
Top 1%
1.3%
19
Nature Communications
4913 papers in training set
Top 56%
1.3%
20
Patterns
70 papers in training set
Top 2%
1.2%
21
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.5%
1.2%
22
Bioinformatics
1061 papers in training set
Top 9%
0.9%
23
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
24
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
25
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
26
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
27
Communications Medicine
85 papers in training set
Top 1%
0.7%
28
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.7%
29
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.6%