Back

An Efficient and Interpretable Learning Approach for Large-Scale Histopathology Data

Moore, C.; Gupta, V.; Neupane, S.; Tripathi, H.

2026-05-03 health informatics
10.64898/2026.04.30.26352196 medRxiv
Show abstract

Prostate cancer (PCa) remains one of the leading causes of cancer-related mortality among men, and histopathological analysis of prostate biopsy specimens is central to diagnosis and risk stratification. Whole-slide Images (WSIs) capture rich morphological information, but their gigapixel scale and the large number of extracted tissue patches make exhaustive annotation and model training computationally expensive. Attention-based Multiple Instance Learning (MIL) has emerged as an effective weakly supervised framework for WSI analysis, enabling slide-level prediction without requiring patch-level annotations. However, training MIL models on large histopathology cohorts remains resource intensive because many extracted patches are non-informative, and some patches are often processed repeatedly during training. To address these challenges, we propose an efficient and interpretable learning framework for large-scale histopathology analysis. Our method combines a pathology-pretrained UNI encoder, a Clustering-constrained Attention Multiple instance learning-Single Branch (CLAM-SB) attention-based MIL model, and a window-based training strategy that reduces computational overhead while preserving predictive performance. The paper illustrates our proposed approach and experiments on TCGA-PRAD WSIs for the PCa patients. Processing 189,600 sampled patches across 79 WSIs with our proposed approach reduced total training time by 57.5% (20 to 8.5 hours for 5 epochs) and 41.4% (27 to 16 hours for 10 epochs), respectively, underscoring its potential as a practical and resource-efficient strategy for scalable prostate histopathology analysis.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 3%
12.9%
2
Nature Communications
4913 papers in training set
Top 22%
8.7%
3
Bioinformatics
1061 papers in training set
Top 4%
6.6%
4
Communications Biology
886 papers in training set
Top 0.5%
5.0%
5
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.3%
5.0%
6
Advanced Science
249 papers in training set
Top 4%
4.5%
7
iScience
1063 papers in training set
Top 3%
4.4%
8
Medical Image Analysis
33 papers in training set
Top 0.3%
4.3%
50% of probability mass above
9
Patterns
70 papers in training set
Top 0.4%
2.7%
10
Nature Machine Intelligence
61 papers in training set
Top 1%
2.4%
11
PLOS ONE
4510 papers in training set
Top 52%
1.7%
12
Scientific Data
174 papers in training set
Top 1%
1.7%
13
Human Brain Mapping
295 papers in training set
Top 3%
1.7%
14
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
15
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
16
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.4%
17
Communications Medicine
85 papers in training set
Top 0.4%
1.4%
18
eBioMedicine
130 papers in training set
Top 2%
1.1%
19
PNAS Nexus
147 papers in training set
Top 0.7%
1.1%
20
Expert Systems with Applications
11 papers in training set
Top 0.3%
1.0%
21
Science Advances
1098 papers in training set
Top 26%
0.9%
22
Artificial Intelligence in Medicine
15 papers in training set
Top 0.6%
0.8%
23
IEEE Transactions on Biomedical Engineering
38 papers in training set
Top 0.8%
0.8%
24
IEEE Transactions on Medical Imaging
18 papers in training set
Top 0.5%
0.8%
25
Heliyon
146 papers in training set
Top 5%
0.8%
26
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
27
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
28
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
29
NeuroImage
813 papers in training set
Top 5%
0.8%
30
JAMIA Open
37 papers in training set
Top 1%
0.8%