Back

Vision Transformers Based AI Models For Predicting Colorectal Cancer from Digital Pathology WSI: Use Case Of MHIST dataset

Kondejkar, T.; Tunik, G.; Amal, S.

2026-02-04 gastroenterology
10.64898/2026.02.03.26345516 medRxiv
Show abstract

This study investigates the efficacy of transformer-based deep learning architectures--specifically, Vision Transformer (ViT), Class Attention in Image Transformers (CaiT), and Data-Efficient Image Transformers (DeiT)--for the binary classification of colorectal polyps using the Minimalist Histopathology Image Analysis Dataset (MHIST). The dataset comprises 3,152 hematoxylin and eosin (H&E)-stained Formalin Fixed Paraffin-Embedded (FFPE) images annotated as either Hyperplastic Polyps (HP) or Sessile Serrated Adenomas (SSA). A rigorous evaluation was conducted using a 5-fold stratified cross-validation methodology, and performance was quantified using metrics including accuracy, precision, recall, F1-score, and AUC-ROC. Experimental results revealed that transformer architectures, particularly CaiT (accuracy of 90.18%, AUC-ROC of 95.52%), outperformed traditional convolutional neural networks (CNNs). The superior performance of CaiT is attributed to its specialized class-attention mechanisms, effectively capturing nuanced morphological differences essential for accurate histopathological classification. These findings underscore the potential of transformer-based models to enhance diagnostic precision, reduce variability in pathological assessment, and facilitate earlier and more reliable colorectal cancer screening.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.7%
19.0%
2
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.1%
6.4%
3
PLOS ONE
4510 papers in training set
Top 30%
4.9%
4
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 0.8%
4.9%
5
Frontiers in Medicine
113 papers in training set
Top 0.8%
4.9%
6
Biomedicines
66 papers in training set
Top 0.3%
2.8%
7
iScience
1063 papers in training set
Top 8%
2.7%
8
Heliyon
146 papers in training set
Top 0.8%
2.4%
9
Nature Communications
4913 papers in training set
Top 46%
2.1%
50% of probability mass above
10
Cureus
67 papers in training set
Top 2%
1.9%
11
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.9%
12
PLOS Digital Health
91 papers in training set
Top 1%
1.9%
13
Journal of Biophotonics
16 papers in training set
Top 0.2%
1.9%
14
Cancers
200 papers in training set
Top 3%
1.8%
15
Biology Methods and Protocols
53 papers in training set
Top 0.9%
1.7%
16
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
17
Journal of Translational Medicine
46 papers in training set
Top 1%
1.5%
18
Journal of Pathology Informatics
13 papers in training set
Top 0.2%
1.4%
19
Frontiers in Oncology
95 papers in training set
Top 3%
1.2%
20
npj Precision Oncology
48 papers in training set
Top 0.8%
1.2%
21
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.0%
22
Journal of Medical Imaging
11 papers in training set
Top 0.3%
0.8%
23
Bioinformatics
1061 papers in training set
Top 9%
0.8%
24
Translational Oncology
18 papers in training set
Top 0.3%
0.8%
25
Communications Medicine
85 papers in training set
Top 0.8%
0.8%
26
Cells
232 papers in training set
Top 5%
0.8%
27
Bioengineering
24 papers in training set
Top 1%
0.8%
28
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
29
Diagnostics
48 papers in training set
Top 2%
0.8%
30
Modern Pathology
21 papers in training set
Top 0.4%
0.8%