Back

Explainable AI for Precision Oncology: A Task-Specific Approach Using Imaging, Multi-omics, and Clinical Data

Park, Y.; Park, S.; Bae, E.

2025-07-14 oncology
10.1101/2025.07.12.25331423
Show abstract

Despite continued advances in oncology, cancer remains a leading cause of global mortality, highlighting the need for diagnostic and prognostic tools that are both accurate and interpretable. Unimodal approaches often fail to capture the biological and clinical complexity of tumors. In this study, we present a suite of task-specific AI models that leverage CT imaging, multi-omics profiles, and structured clinical data to address distinct challenges in segmentation, classification, and prognosis. We developed three independent models across large public datasets. Task 1 applied a 3D U-Net to segment pancreatic tumors from CT scans, achieving a Dice Similarity Coefficient (DSC) of 0.7062. Task 2 employed a hierarchical ensemble of omics-based classifiers to distinguish tumor from normal tissue and classify six major cancer types with 98.67% accuracy. Task 3 benchmarked classical machine learning models on clinical data for prognosis prediction across three cancers (LIHC, KIRC, STAD), achieving strong performance (e.g., C-index of 0.820 in KIRC, AUC of 0.978 in LIHC). Across all tasks, explainable AI methods such as SHAP and attention-based visualization enabled transparent interpretation of model outputs. These results demonstrate the value of tailored, modality-aware models and underscore the clinical potential of applying such tailored AI systems for precision oncology. Technical FoundationsO_LISegmentation (Task 1): A custom 3D U-Net was trained using the Task07_Pancreas dataset from the Medical Segmentation Decathlon (MSD). CT images were preprocessed with MONAI-based pipelines, resampled to (64, 96, 96) voxels, and intensity-windowed to HU ranges of -100 to 240. C_LIO_LIClassification (Task 2): Multi-omics data from TCGA--including gene expression, methylation, miRNA, CNV, and mutation profiles--were log-transformed and normalized. Five modality-specific LightGBM classifiers generated meta-features for a late-fusion ensemble. Stratified 5-fold cross-validation was used for evaluation. C_LIO_LIPrognosis (Task 3): Clinical variables from TCGA were curated and imputed (median/mode), with high-missing-rate columns removed. Survival models (e.g., Cox-PH, Random Forest, XGBoost) were trained with early stopping. No omics or imaging data were used in this task. C_LIO_LIInterpretability: SHAP values were computed for all tree-based models, and attention-based overlays were used in imaging tasks to visualize salient regions. C_LI

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Precision Oncology
based on 14 papers
Top 0.1%
15.5%
2
Nature Communications
based on 483 papers
Top 3%
15.5%
3
JCO Clinical Cancer Informatics
based on 14 papers
Top 0.1%
11.2%
4
Scientific Reports
based on 701 papers
Top 28%
6.4%
5
PLOS Computational Biology
based on 141 papers
Top 3%
4.7%
50% of probability mass above
6
Clinical Cancer Research
based on 22 papers
Top 1%
4.5%
7
International Journal of Radiation Oncology*Biology*Physics
based on 13 papers
Top 0.6%
4.5%
8
eLife
based on 262 papers
Top 8%
2.8%
9
Radiotherapy and Oncology
based on 11 papers
Top 0.9%
2.3%
10
Cancers
based on 57 papers
Top 5%
2.3%
11
Frontiers in Oncology
based on 34 papers
Top 4%
1.6%
12
Nature
based on 58 papers
Top 6%
1.3%
13
Nature Genetics
based on 72 papers
Top 7%
1.3%
14
JCO Precision Oncology
based on 11 papers
Top 2%
1.3%
15
Journal for ImmunoTherapy of Cancer
based on 14 papers
Top 2%
1.2%
16
Neuro-Oncology Advances
based on 14 papers
Top 2%
1.2%
17
Scientific Data
based on 30 papers
Top 2%
1.2%
18
npj Digital Medicine
based on 85 papers
Top 11%
1.2%
19
Computers in Biology and Medicine
based on 39 papers
Top 5%
1.2%
20
Proceedings of the National Academy of Sciences
based on 100 papers
Top 12%
0.8%
21
Breast Cancer Research
based on 11 papers
Top 1%
0.8%
22
Communications Medicine
based on 63 papers
Top 3%
0.8%
23
Communications Biology
based on 36 papers
Top 6%
0.7%
24
PLOS ONE
based on 1737 papers
Top 96%
0.7%
25
JAMA Network Open
based on 125 papers
Top 21%
0.7%