Back

Fast Organ-of-Origin Classification for Digital Pathology Quality Control

Aswolinskiy, W.; Wong, J. K. L.; Zapukhlyak, M.; Kindruk, Y.; Paulikat, M.; Aichmüller, C.

2026-02-04 pathology
10.64898/2026.02.03.26345443 medRxiv
Show abstract

Digitizing large histopathology archives requires processing millions of scanned whole slide images that must be validated rapidly. Automated organ-of-origin classification can accelerate quality control and enable early detection of mislabeled specimens. We developed a deep learning model that classifies the organ of origin from H&E-stained slides using a single low-resolution thumbnail per slide in under one second. For training, we used thumbnails from 16,624 slides from the TCGA and CPTAC archives, which contain mostly primary tumor resections. The images were categorized into 14 classes based on the most common primary sites in TCGA: Bladder, Brain, Breast, Colorectal, Kidney, Liver, Lung, Pancreas, Prostate, Skin, Stomach, Thyroid gland, Uterus, and Other (encompassing the remaining tissue types). We evaluated our approach on two independent external cohorts: a 5-class cohort with 2,857 slides (Colorectal, Kidney, Liver, Pancreas, Prostate) and a comprehensive 14-class cohort (12,348 slides). The model achieved 90% balanced accuracy for the 5-class cohort and 62% for the full 14-class cohort. Notably, when considering only the predictions with high confidence, 53% of the large cohort could be classified with 74% balanced accuracy. Manual review of high-confidence misclassifications suggested that some may reflect errors in the ground truth rather than model error. Mean model inference time was 0.2s per slide on an NVIDIA L4 GPU. Our deep learning approach demonstrates high classification performance with very low inference time, indicating its potential for real-time and cost-effective quality control in digital pathology.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Modern Pathology
21 papers in training set
Top 0.1%
22.9%
2
Journal of Pathology Informatics
13 papers in training set
Top 0.1%
22.9%
3
Nature Communications
4913 papers in training set
Top 26%
6.9%
50% of probability mass above
4
Nature Methods
336 papers in training set
Top 3%
3.7%
5
PLOS ONE
4510 papers in training set
Top 38%
3.6%
6
Scientific Reports
3102 papers in training set
Top 40%
3.3%
7
Laboratory Investigation
13 papers in training set
Top 0.1%
3.1%
8
Communications Biology
886 papers in training set
Top 8%
1.7%
9
The Lancet Digital Health
25 papers in training set
Top 0.4%
1.7%
10
Clinical Chemistry
22 papers in training set
Top 0.5%
1.4%
11
npj Digital Medicine
97 papers in training set
Top 2%
1.4%
12
Journal of Medical Imaging
11 papers in training set
Top 0.2%
1.4%
13
ACS Nano
99 papers in training set
Top 3%
1.4%
14
iScience
1063 papers in training set
Top 23%
1.1%
15
npj Precision Oncology
48 papers in training set
Top 0.9%
1.0%
16
eBioMedicine
130 papers in training set
Top 3%
0.9%
17
Science Translational Medicine
111 papers in training set
Top 5%
0.8%
18
Cancer Research
116 papers in training set
Top 3%
0.8%
19
GigaScience
172 papers in training set
Top 3%
0.8%
20
The Journal of Pathology
22 papers in training set
Top 0.5%
0.8%
21
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
22
Communications Medicine
85 papers in training set
Top 1%
0.7%
23
Physics in Medicine & Biology
17 papers in training set
Top 0.5%
0.7%
24
Breast Cancer Research
32 papers in training set
Top 0.6%
0.7%
25
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.7%
26
Genome Biology
555 papers in training set
Top 8%
0.7%
27
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%
28
Clinical Cancer Research
58 papers in training set
Top 2%
0.5%
29
Journal of Clinical Microbiology
120 papers in training set
Top 2%
0.5%
30
New Phytologist
309 papers in training set
Top 5%
0.5%