Back

Onca: An Open 9B Language Model for Pancreatic Cancer Clinical Tasks

Shim, K. B.

2026-04-24 oncology
10.64898/2026.04.16.26351055 medRxiv
Show abstract

Pancreatic ductal adenocarcinoma (PDAC) remains one of the deadliest solid tumors and continues to face low treatment-trial participation, fragmented evidence workflows, and labor-intensive ab- straction of unstructured clinical text. Existing oncology-focused language models show promise, but many depend on private institutional corpora, limiting reproducibility and practical reuse across centers. We present Onca, an open 9B dense model designed for four PDAC-relevant tasks: trial eligibility screening, case-specific clinical reasoning, structured pathology report extraction, and molecular variant evidence reasoning. Onca is fine-tuned from Qwopus3.5-9B-v3 with a single Un- sloth BF16 LoRA adapter on 37,364 training rows drawn from openly available sources. The evalu- ation spans 11 panels and compares Onca against Woollie-7B, CancerLLM-7B, OpenBioLLM-8B, and the unmodified Qwopus base. Onca achieves the strongest overall results on Trial Screening (81.6 F1), Clinical Reasoning (14.1 composite), Pathology Extraction (30.5 field exact-match), Pub- MedQA Cancer (68.3 macro-F1), and PubMedQA (66.5 macro-F1). The strongest gains appear in tasks closest to routine oncology workflow, especially trial review and pathology structuring. These findings suggest that clinically targeted pancreatic-cancer language models can be built from open data with competitive performance while remaining practical to train on a single workstation-scale GPU setup.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
18.3%
2
npj Digital Medicine
97 papers in training set
Top 0.6%
8.3%
3
Nature Communications
4913 papers in training set
Top 25%
7.0%
4
Nature Medicine
117 papers in training set
Top 0.4%
6.2%
5
Clinical Cancer Research
58 papers in training set
Top 0.3%
4.8%
6
Nature Cancer
35 papers in training set
Top 0.3%
3.5%
7
Nature
575 papers in training set
Top 7%
3.5%
50% of probability mass above
8
Nature Genetics
240 papers in training set
Top 2%
3.5%
9
iScience
1063 papers in training set
Top 5%
3.5%
10
Scientific Reports
3102 papers in training set
Top 44%
2.7%
11
PLOS ONE
4510 papers in training set
Top 49%
2.1%
12
eLife
5422 papers in training set
Top 40%
1.8%
13
Cancer Research
116 papers in training set
Top 2%
1.6%
14
Cancer Cell
38 papers in training set
Top 1%
1.6%
15
Med
38 papers in training set
Top 0.3%
1.6%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 34%
1.6%
17
European Journal of Cancer
10 papers in training set
Top 0.2%
1.5%
18
npj Precision Oncology
48 papers in training set
Top 0.8%
1.3%
19
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
20
JCO Precision Oncology
14 papers in training set
Top 0.3%
0.9%
21
Genome Research
409 papers in training set
Top 3%
0.9%
22
Cancer Discovery
61 papers in training set
Top 2%
0.9%
23
Interface Focus
14 papers in training set
Top 0.3%
0.7%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
25
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
26
Cell Reports Medicine
140 papers in training set
Top 8%
0.7%
27
Metabolites
50 papers in training set
Top 1%
0.7%