Back

Prediction of TP53 biomarkers and survival outcomes from whole slide images using a vision transformer-based multi-instance learning framework

Chaurasia, A. K.; Toohey, P. W.; Bennett, M. T.; Harris, H. C.; Hewitt, A. W.

2025-11-13 oncology
10.1101/2025.11.11.25340052
Show abstract

BackgroundAccurate molecular profiling and prognostication from routine histopathology slides could transform precision oncology. We developed a Vision Transformer (ViT)-based multi-instance learning (MIL) framework for combined predictions of 32 solid tumour types, TP53 biomarker detection, and survival prediction directly from Whole Slide Images (WSIs). Methods11,060 primary tumours were curated from the TCGA Pan-Cancer Atlas with corresponding somatic mutations, RNA-seq, and clinical outcome data. TP53 alterations were classified as pathogenic drivers using COSMIC and hotspot annotations. WSIs underwent tissue masking, quality control, stain normalisation, and patch extraction (518 x 518) at 6x downsampling. Each patch was encoded by a ViT into a 768-dimensional embedding, which formed a token sequence for a 6-layer Transformer aggregator with learnable classification and positional embeddings. Seven task heads were developed to generate predictions for various outcomes, including cancer type, TP53 mutation status, TP53 RNA expression levels, overall survival (OS), progression-free interval (PFI), and the corresponding times for OS and PFI. The training process had two stages. First, the model was trained on tumour tissue patches from WSIs at five magnifications. In the second stage, it was fine-tuned using patches from all tissue regions with a content-aware strategy, updating all MIL layers for a maximum of 150 epochs at a learning rate of 1 x 10-. The models performance was evaluated on an independent validation set of 1,729 slides using classification metrics, including the area under the receiver operating characteristic curve (AUROC), regression metrics, and Concordance indices (C-index). ResultsThe multi-resolution ViT-based MIL model achieved an AUROC of 0.775 (95% CI: 0.749-0.801) for TP53 mutation detection on the validation set, demonstrating strong overall performance across classification and survival prediction tasks. The fine-tuned model attained robust performance across the tasks, with 0.7569 accuracy for cancer classification, 0.745 AUROC for TP53 mutation detection, C-indices of 0.686 and 0.650 for OS and PFI, and a mean squared error of 1.072 for TP53 RNA expression level estimation. The fine-tuned model attained an accuracy of 65.9% (95% CI: 0.636-0.681) in tumour classification and an AUROC of 0.766 (95% CI: 0.743-0.789) for detecting TP53 mutations on the external validation set. However, most tumour classes, aside from ovarian cancer, reached an AUROC above 0.88 with class-specific thresholding using the Youden Index. This indicates strong generalisation across 32 tumour types, providing reasonable molecular profiling but offering limited prognostic utility in surgical oncology. ConclusionA ViT-based MIL model can simultaneously infer tumour taxonomy, TP53 mutation status, and TP53 RNA expression levels directly from WSIs, with performance comparable to conventional genomic assays, while prognostic risk remains limited. This integrated, slide-level approach offers a scalable pipeline toward computational pathology.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
based on 483 papers
Top 0.7%
21.3%
2
npj Precision Oncology
based on 14 papers
Top 0.1%
13.7%
3
JCO Clinical Cancer Informatics
based on 14 papers
Top 0.2%
8.2%
4
Scientific Reports
based on 701 papers
Top 26%
6.9%
50% of probability mass above
5
Cancers
based on 57 papers
Top 4%
3.0%
6
Clinical Cancer Research
based on 22 papers
Top 1%
3.0%
7
PLOS Computational Biology
based on 141 papers
Top 4%
3.0%
8
PLOS ONE
based on 1737 papers
Top 81%
2.7%
9
Frontiers in Oncology
based on 34 papers
Top 4%
1.9%
10
British Journal of Cancer
based on 22 papers
Top 2%
1.9%
11
eLife
based on 262 papers
Top 16%
1.7%
12
International Journal of Radiation Oncology*Biology*Physics
based on 13 papers
Top 2%
1.7%
13
Cancer Medicine
based on 17 papers
Top 2%
1.4%
14
Breast Cancer Research
based on 11 papers
Top 0.8%
1.4%
15
Communications Medicine
based on 63 papers
Top 1%
1.4%
16
Nature Medicine
based on 88 papers
Top 10%
1.3%
17
Neuro-Oncology Advances
based on 14 papers
Top 1%
1.3%
18
iScience
based on 74 papers
Top 5%
1.3%
19
JCO Precision Oncology
based on 11 papers
Top 2%
1.3%
20
Nature Genetics
based on 72 papers
Top 8%
0.9%
21
The Lancet Digital Health
based on 25 papers
Top 4%
0.9%
22
Modern Pathology
based on 10 papers
Top 0.8%
0.9%
23
Journal for ImmunoTherapy of Cancer
based on 14 papers
Top 2%
0.9%
24
Nature
based on 58 papers
Top 10%
0.7%
25
Radiotherapy and Oncology
based on 11 papers
Top 2%
0.7%