Back

Generating highly accurate pathology reports from gigapixel whole slide images with HistoGPT

Tran, M.; Schmidle, P.; Wagner, S. J.; Koch, V.; Lupperger, V.; Feuchtinger, A.; Boehner, A.; Kaczmarczyk, R.; Biedermann, T.; Eyerich, K.; Braun, S. A.; Peng, T.; Marr, C.

2024-03-18 pathology
10.1101/2024.03.15.24304211 medRxiv
Show abstract

Histopathology is considered the reference standard for diagnosing the presence and nature of many malignancies, including cancer. However, analyzing tissue samples and writing pathology reports is time-consuming, labor-intensive, and non-standardized. To address this problem, we present HistoGPT, the first vision language model that simultaneously generates reports from multiple pathology images. It was trained on more than 15,000 whole slide images from over 6,000 dermatology patients with corresponding pathology reports. The generated reports match the quality of human-written reports, as confirmed by a variety of natural language processing metrics and domain expert evaluations. We show that HistoGPT generalizes to six geographically diverse cohorts and can predict tumor subtypes and tumor thickness in a zero-shot fashion. Our model demonstrates the potential of an AI assistant that supports pathologists in evaluating, reporting, and understanding routine dermatopathology cases.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Modern Pathology
21 papers in training set
Top 0.1%
34.1%
2
Journal of Pathology Informatics
13 papers in training set
Top 0.1%
19.3%
50% of probability mass above
3
Scientific Reports
3102 papers in training set
Top 15%
6.6%
4
PLOS ONE
4510 papers in training set
Top 35%
4.1%
5
Nature Communications
4913 papers in training set
Top 38%
3.7%
6
iScience
1063 papers in training set
Top 6%
3.4%
7
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.8%
8
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
9
New Phytologist
309 papers in training set
Top 4%
1.4%
10
npj Digital Medicine
97 papers in training set
Top 3%
1.1%
11
GigaScience
172 papers in training set
Top 2%
1.0%
12
Nature Methods
336 papers in training set
Top 5%
0.9%
13
Communications Biology
886 papers in training set
Top 20%
0.8%
14
Journal of Medical Imaging
11 papers in training set
Top 0.3%
0.8%
15
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
16
Advanced Science
249 papers in training set
Top 18%
0.8%
17
Science Translational Medicine
111 papers in training set
Top 6%
0.8%
18
Laboratory Investigation
13 papers in training set
Top 0.3%
0.7%
19
npj Precision Oncology
48 papers in training set
Top 1%
0.7%
20
Cancer Research
116 papers in training set
Top 4%
0.5%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 12%
0.5%
22
JAMIA Open
37 papers in training set
Top 2%
0.5%
23
Breast Cancer Research
32 papers in training set
Top 0.6%
0.5%
24
The Journal of Pathology
22 papers in training set
Top 0.7%
0.5%
25
The Lancet Digital Health
25 papers in training set
Top 1%
0.5%