Back

Robust, credible, and interpretable AI-based histopathological prostate cancer grading

Westhaeusser, F.; Fuhlert, P.; Dietrich, E.; Lennartz, M.; Khatri, R.; Kaiser, N.; Roebeck, P.; Buelow, R.; von Stillfried, S.; Witte, A.; Ladjevardi, S.; Drotte, A.; Severgardh, P.; Baumbach, J.; Puelles, V. G.; Haeggman, M.; Brehler, M.; Boor, P.; Walhagen, P.; Dragomir, A.; Busch, C.; Graefen, M.; Bengtsson, E.; Sauter, G.; Zimmermann, M.; Bonn, S.

2024-07-10 pathology
10.1101/2024.07.09.24310082 medRxiv
Show abstract

BackgroundProstate cancer (PCa) is among the most common cancers in men and its diagnosis requires the histopathological evaluation of biopsies by human experts. While several recent artificial intelligence-based (AI) approaches have reached human expert-level PCa grading, they often display significantly reduced performance on external datasets. This reduced performance can be caused by variations in sample preparation, for instance the staining protocol, section thickness, or scanner used. Another limiting factor of contemporary AI-based PCa grading is the prediction of ISUP grades, which leads to the perpetuation of human annotation errors. MethodsWe developed the prostate cancer aggressiveness index (PCAI), an AI-based PCa detection and grading framework that is trained on objective patient outcome, rather than subjective ISUP grades. We designed PCAI as a clinical application, containing algorithmic modules that offer robustness to data variation, medical interpretability, and a measure of prediction confidence. To train and evaluate PCAI, we generated a multicentric, retrospective, observational trial consisting of six cohorts with 25,591 patients, 83,864 images, and 5 years of median follow-up from 5 different centers and 3 countries. This includes a high-variance dataset of 8,157 patients and 28,236 images with variations in sample thickness, staining protocol, and scanner, allowing for the systematic evaluation and optimization of model robustness to data variation. The performance of PCAI was assessed on three external test cohorts from two countries, comprising 2,255 patients and 9,437 images. FindingsUsing our high-variance datasets, we show how differences in sample processing, particularly slide thickness and staining time, significantly reduce the performance of AI-based PCa grading by up to 6.2 percentage points in the concordance index (C-index). We show how a select set of algorithmic improvements, including domain adversarial training, conferred robustness to data variation, interpretability, and a measure of credibility to PCAI. These changes lead to significant prediction improvement across two biopsy cohorts and one TMA cohort, systematically exceeding expert ISUP grading in C-index and AUROC by up to 22 percentage points. InterpretationData variation poses serious risks for AI-based histopathological PCa grading, even when models are trained on large datasets. Algorithmic improvements for model robustness, interpretability, credibility, and training on high-variance data as well as outcome-based severity prediction gives rise to robust models with above ISUP-level PCa grading performance.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Modern Pathology
21 papers in training set
Top 0.1%
40.9%
2
Journal of Pathology Informatics
13 papers in training set
Top 0.1%
10.8%
50% of probability mass above
3
Scientific Reports
3102 papers in training set
Top 16%
6.6%
4
Nature Communications
4913 papers in training set
Top 32%
5.0%
5
PLOS ONE
4510 papers in training set
Top 30%
5.0%
6
The Lancet Digital Health
25 papers in training set
Top 0.2%
3.4%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.2%
8
The Prostate
11 papers in training set
Top 0.1%
1.7%
9
Biology Methods and Protocols
53 papers in training set
Top 1%
1.4%
10
BMC Cancer
52 papers in training set
Top 2%
0.9%
11
Clinical Cancer Research
58 papers in training set
Top 1%
0.9%
12
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
13
The Journal of Pathology
22 papers in training set
Top 0.4%
0.8%
14
Cancers
200 papers in training set
Top 4%
0.8%
15
eBioMedicine
130 papers in training set
Top 4%
0.8%
16
iScience
1063 papers in training set
Top 32%
0.7%
17
BMC Medicine
163 papers in training set
Top 8%
0.7%
18
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
19
European Journal of Cancer
10 papers in training set
Top 0.7%
0.5%
20
New Phytologist
309 papers in training set
Top 5%
0.5%
21
Cancer Research
116 papers in training set
Top 4%
0.5%
22
Journal of Medical Imaging
11 papers in training set
Top 0.4%
0.5%
23
British Journal of Cancer
42 papers in training set
Top 2%
0.5%
24
Journal of Magnetic Resonance Imaging
14 papers in training set
Top 0.7%
0.5%
25
Breast Cancer Research
32 papers in training set
Top 0.6%
0.5%
26
Journal of Clinical Pathology
12 papers in training set
Top 0.6%
0.5%
27
The American Journal of Pathology
31 papers in training set
Top 0.7%
0.5%