Back

Combining Clinician Expertise with Prompt Engineering enhances Small Language Models Reliability for Cancer Entity Recognition in Electronic Health Records

Corso, F.; Peppoloni, V.; Mazzeo, L.; Leone, G.; Passos, L.; Miskovic, V.; Armanini, J.; Ferrarin, A.; Wiest, I. C.; Wolf, F.; Montelatici, G.; Romano', R.; Ambrosini, P.; Capoccia, T.; Natangelo, S.; Rota, S.; Andena, P.; De Ponti, M.; Russo, A.; Stasi, G.; Provenzano, L.; Spagnoletti, A.; Meazza Prina, M.; Cavalli, C.; Giani, C.; Serino, R.; Borraccino, M.; Bonalume, C.; Di Mauro, R. M.; Agosta, C.; Dumitrascu, A. D.; Di Liberti, G.; Corrao, G.; Beninato, T.; Ganzinelli, M.; Occhipinti, M.; Brambilla, M.; Proto, C.; Kather, J. N.; Pedrocchi, A. L. G.; De Braud, F.; Lo Russo, G.; Baili, P.; P

2025-10-21 oncology
10.1101/2025.10.16.25337917
Show abstract

Real-world data (RWD), largely stored in unstructured electronic health records (EHRs), are critical for understanding complex diseases like cancer. However, extracting structured information from these narratives is challenging due to linguistic variability, semantic complexity, and privacy concerns. This study evaluates the performance of four locally deployable and small language models (SLMs), LLaMA, Mistral, BioMistral, and MedLLaMA, for information extraction (IE) from Italian EHRs within the APOLLO 11 trial on non-small cell lung cancer (NSCLC). We examined three prompting strategies (zero-shot, few-shot, and annotated few-shot) across English and Italian, involving clinicians with varying expertise to assess prompt designs impact on accuracy. Results show that general-purpose models (e.g., LLaMA 3.1 8B) outperform biomedical models in most tasks, particularly in extracting binary features. Multiclass variables such as TNM staging, PD-L1, and ECOG were more difficult due to implicit language and lack of standardization. Few-shot prompting and native-language inputs significantly improved performance and reduced hallucinations. Clinical expertise enhanced consistency in annotation, particularly among students using annotated examples. The study confirms that privacy-preserving SLMs can be deployed locally for efficient and secure cancer data extraction. Findings highlight the need for hybrid systems combining SLMs with expert input and underline the importance of aligning clinical documentation practices with SLM capabilities. This is the first study to benchmark SLMs on Italian EHRs and investigate the role of clinical expertise in prompt engineering, offering valuable insights for the future integration of SLMs into real-world clinical workflows.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
based on 14 papers
Top 0.1%
15.6%
2
PLOS ONE
based on 1737 papers
Top 49%
10.3%
3
Scientific Reports
based on 701 papers
Top 15%
10.3%
4
npj Digital Medicine
based on 85 papers
Top 3%
5.9%
5
npj Precision Oncology
based on 14 papers
Top 0.9%
2.8%
6
Journal of Biomedical Informatics
based on 37 papers
Top 2%
2.8%
7
Computers in Biology and Medicine
based on 39 papers
Top 3%
2.5%
50% of probability mass above
8
JAMIA Open
based on 35 papers
Top 4%
2.5%
9
PLOS Computational Biology
based on 141 papers
Top 5%
2.5%
10
Nature Communications
based on 483 papers
Top 26%
2.3%
11
Cancer Medicine
based on 17 papers
Top 2%
2.3%
12
JMIR Formative Research
based on 31 papers
Top 3%
1.6%
13
PeerJ
based on 46 papers
Top 5%
1.6%
14
Journal of the American Medical Informatics Association
based on 53 papers
Top 5%
1.3%
15
eLife
based on 262 papers
Top 21%
1.3%
16
Diagnostics
based on 36 papers
Top 4%
1.3%
17
International Journal of Medical Informatics
based on 25 papers
Top 4%
1.3%
18
Scientific Data
based on 30 papers
Top 2%
1.2%
19
JMIR Medical Informatics
based on 16 papers
Top 4%
1.2%
20
Cureus
based on 64 papers
Top 14%
1.2%
21
International Journal of Radiation Oncology*Biology*Physics
based on 13 papers
Top 2%
1.2%
22
JCO Precision Oncology
based on 11 papers
Top 2%
0.8%
23
Patterns
based on 15 papers
Top 3%
0.8%
24
Frontiers in Oncology
based on 34 papers
Top 5%
0.8%
25
Cancers
based on 57 papers
Top 7%
0.8%
26
Biology Methods and Protocols
based on 19 papers
Top 2%
0.8%
27
BMJ Health & Care Informatics
based on 13 papers
Top 3%
0.8%
28
JMIR Research Protocols
based on 18 papers
Top 4%
0.7%
29
JAMA Network Open
based on 125 papers
Top 21%
0.7%
30
Journal of Medical Internet Research
based on 81 papers
Top 16%
0.7%