An Efficient and Interpretable Learning Approach for Large-Scale Histopathology Data

Moore, C.; Gupta, V.; Neupane, S.; Tripathi, H.

2026-05-03 health informatics

10.64898/2026.04.30.26352196 medRxiv

Show abstract

Prostate cancer (PCa) remains one of the leading causes of cancer-related mortality among men, and histopathological analysis of prostate biopsy specimens is central to diagnosis and risk stratification. Whole-slide Images (WSIs) capture rich morphological information, but their gigapixel scale and the large number of extracted tissue patches make exhaustive annotation and model training computationally expensive. Attention-based Multiple Instance Learning (MIL) has emerged as an effective weakly supervised framework for WSI analysis, enabling slide-level prediction without requiring patch-level annotations. However, training MIL models on large histopathology cohorts remains resource intensive because many extracted patches are non-informative, and some patches are often processed repeatedly during training. To address these challenges, we propose an efficient and interpretable learning framework for large-scale histopathology analysis. Our method combines a pathology-pretrained UNI encoder, a Clustering-constrained Attention Multiple instance learning-Single Branch (CLAM-SB) attention-based MIL model, and a window-based training strategy that reduces computational overhead while preserving predictive performance. The paper illustrates our proposed approach and experiments on TCGA-PRAD WSIs for the PCa patients. Processing 189,600 sampled patches across 79 WSIs with our proposed approach reduced total training time by 57.5% (20 to 8.5 hours for 5 epochs) and 41.4% (27 to 16 hours for 10 epochs), respectively, underscoring its potential as a practical and resource-efficient strategy for scalable prostate histopathology analysis.

An Efficient and Interpretable Learning Approach for Large-Scale Histopathology Data

Matching journals