Vision Transformers Based AI Models For Predicting Colorectal Cancer from Digital Pathology WSI: Use Case Of MHIST dataset
Kondejkar, T.; Tunik, G.; Amal, S.
Show abstract
This study investigates the efficacy of transformer-based deep learning architectures--specifically, Vision Transformer (ViT), Class Attention in Image Transformers (CaiT), and Data-Efficient Image Transformers (DeiT)--for the binary classification of colorectal polyps using the Minimalist Histopathology Image Analysis Dataset (MHIST). The dataset comprises 3,152 hematoxylin and eosin (H&E)-stained Formalin Fixed Paraffin-Embedded (FFPE) images annotated as either Hyperplastic Polyps (HP) or Sessile Serrated Adenomas (SSA). A rigorous evaluation was conducted using a 5-fold stratified cross-validation methodology, and performance was quantified using metrics including accuracy, precision, recall, F1-score, and AUC-ROC. Experimental results revealed that transformer architectures, particularly CaiT (accuracy of 90.18%, AUC-ROC of 95.52%), outperformed traditional convolutional neural networks (CNNs). The superior performance of CaiT is attributed to its specialized class-attention mechanisms, effectively capturing nuanced morphological differences essential for accurate histopathological classification. These findings underscore the potential of transformer-based models to enhance diagnostic precision, reduce variability in pathological assessment, and facilitate earlier and more reliable colorectal cancer screening.
Matching journals
The top 9 journals account for 50% of the predicted probability mass.