Back

Accessible and Reproducible Renal Cell Carcinoma Research Through Open-Sourcing Data and Annotations

de Boer, S.; Häntze, H.; Ziegelmayer, S.; van Ginneken, B.; Prokop, M.; Bressem, K. K.; Hering, A.

2026-04-23 radiology and imaging
10.64898/2026.04.22.26351451 medRxiv
Show abstract

Background: Medical imaging, especially computed tomography and magnetic resonance imaging, is essential in clinical care of patients with renal cell carcinoma (RCC). Artificial intelligence (AI) research into computer-aided diagnosis, staging and treatment planning needs curated and annotated datasets. Across literature, The Cancer Genome Atlas (TCGA) datasets are widely used for model training and validation. However, re-annotation is often necessary due to limited access to public annotations, raising entry barriers and hindering comparison with prior work. Methods: We screened 1915 CT scans from three TCGA-RCC databases and employed a segmentation model to annotate kidney lesion. After a meta-data-based exclusion step, we hosted a reader study with all papillary (n=56), chromophobe (n=27) and 200 randomly selected clear cell RCC cases. Two students quality checked and corrected the data as well as annotated tumors and cysts. Uncertain cases were checked by a board-certified radiologist. Results: After data exclusion and quality control a total of 142 annotated CT scans from 101 patients (26 female, 75 male, mean age 56 years) remained. This includes 95 CTs with clear cell RCC, 29 with papillary RCC and 18 with chromophobe RCC. Images and voxel-level annotations of kidneys and lesions are open sourced at https://zenodo.org/records/19630298. Conclusion: By making the annotations open-source, we encourage accessible and reproducible AI research for renal cell carcinoma. We invite other researchers who have previously annotated any of these cohorts to share their annotations.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
European Radiology
14 papers in training set
Top 0.1%
15.2%
2
PLOS ONE
4510 papers in training set
Top 12%
14.8%
3
Diagnostics
48 papers in training set
Top 0.1%
8.7%
4
Scientific Reports
3102 papers in training set
Top 9%
8.7%
5
Frontiers in Oncology
95 papers in training set
Top 0.9%
4.1%
50% of probability mass above
6
Medical Physics
14 papers in training set
Top 0.2%
2.7%
7
The Lancet Digital Health
25 papers in training set
Top 0.2%
2.1%
8
JAMA Network Open
127 papers in training set
Top 1%
2.1%
9
Kidney360
22 papers in training set
Top 0.3%
2.1%
10
Journal of Medical Imaging
11 papers in training set
Top 0.1%
1.9%
11
Archives of Clinical and Biomedical Research
28 papers in training set
Top 0.6%
1.7%
12
Cancers
200 papers in training set
Top 3%
1.7%
13
Frontiers in Medicine
113 papers in training set
Top 4%
1.5%
14
eBioMedicine
130 papers in training set
Top 2%
1.5%
15
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.4%
1.4%
16
Data in Brief
13 papers in training set
Top 0.1%
1.4%
17
Annals of Translational Medicine
17 papers in training set
Top 0.9%
1.1%
18
GigaScience
172 papers in training set
Top 2%
1.0%
19
Heliyon
146 papers in training set
Top 4%
1.0%
20
Journal of Magnetic Resonance Imaging
14 papers in training set
Top 0.5%
0.9%
21
Scientific Data
174 papers in training set
Top 2%
0.9%
22
Frontiers in Physiology
93 papers in training set
Top 5%
0.9%
23
Frontiers in Neuroinformatics
38 papers in training set
Top 0.6%
0.9%
24
Neuroinformatics
40 papers in training set
Top 0.8%
0.8%
25
Journal of Clinical Medicine
91 papers in training set
Top 6%
0.8%
26
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.9%
0.7%
27
Frontiers in Artificial Intelligence
18 papers in training set
Top 1%
0.5%
28
Nature Communications
4913 papers in training set
Top 66%
0.5%
29
Informatics in Medicine Unlocked
21 papers in training set
Top 2%
0.5%