Back

Privacy-Preserving Large Language Model Deployment for Oncology Registry Abstraction: Structure-Aware Evaluation in a Real-World Clinical Setting

Enikeev, R.; Moldovan, M.; Chu, M.; Amalraj, A.; Koli, P. P.; Abdul, S. S.; Sivaraj, H.; Iqbal, U.; Toh, C. K.

2026-05-21 health informatics
10.64898/2026.05.18.26353541 medRxiv
Show abstract

Background: Structuring oncology clinical notes into registry-grade variables is essential for research and care but remains labour-intensive and error-prone. Objective: To develop and evaluate a privacy-preserving large language model pipeline for oncology registry abstraction in a real-world clinical setting. Methods: We deployed an open-source Meta Llama 3.3 70B-based pipeline to extract over 50 variables from 6,700 oncology notes at a cancer centre in Singapore. Data were de-identified locally using a Hide-In-Plain-Sight approach, ensuring no identifiable data left hospital infrastructure. Performance was assessed on 200 randomly sampled notes with adjudicated ground truth. A structure-aware framework classified outputs as correct, missing, spurious, or incorrect. Results: F1 scores were high across variables, including diagnosis (97.2%), histology (95.8%), stage (92.6%), biomarkers (91.4%), and treatments (88.1%). Transferability testing on 50 external notes showed strong performance for core variables. Conclusions: Privacy-preserving LLMs can achieve near-human-level accuracy for oncology abstraction, with structure-aware evaluation enabling more clinically meaningful assessment. Keywords: Oncology Registry Abstraction, Privacy-Preserving Deployment, Clinical Information Extraction, Structure-Aware Evaluation, Large Language Models, Template-Filling Metrics

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
18.8%
2
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.2%
10.5%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
9.2%
4
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
7.2%
5
npj Digital Medicine
97 papers in training set
Top 0.7%
6.9%
50% of probability mass above
6
Journal of Medical Internet Research
85 papers in training set
Top 0.9%
4.9%
7
International Journal of Medical Informatics
25 papers in training set
Top 0.4%
3.6%
8
JMIR Medical Informatics
17 papers in training set
Top 0.3%
3.6%
9
Scientific Reports
3102 papers in training set
Top 35%
3.6%
10
JAMIA Open
37 papers in training set
Top 0.5%
3.3%
11
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
2.4%
12
Frontiers in Digital Health
20 papers in training set
Top 0.5%
1.9%
13
iScience
1063 papers in training set
Top 14%
1.7%
14
BMC Medical Research Methodology
43 papers in training set
Top 0.6%
1.7%
15
PLOS ONE
4510 papers in training set
Top 56%
1.5%
16
Bioinformatics
1061 papers in training set
Top 8%
1.5%
17
Nature Communications
4913 papers in training set
Top 55%
1.3%
18
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.2%
19
BMJ Health & Care Informatics
13 papers in training set
Top 0.6%
1.2%
20
Patterns
70 papers in training set
Top 2%
0.9%
21
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
22
The Lancet Digital Health
25 papers in training set
Top 1.0%
0.8%
23
Cancer Medicine
24 papers in training set
Top 1%
0.8%
24
Informatics in Medicine Unlocked
21 papers in training set
Top 1%
0.6%
25
European Respiratory Journal
54 papers in training set
Top 2%
0.6%
26
Cureus
67 papers in training set
Top 6%
0.6%
27
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.5%