Back

DBT-2026, a de-identified publicly available dataset of digital breast tomosynthesis exams with ground truth biopsies

Wu, J.; Perandini, L.; Batra, T.; Igoshin, S.; Bari, S.; de Araujo, A. L.; Willemink, M. J.

2026-03-04 radiology and imaging
10.64898/2026.03.03.25337924 medRxiv
Show abstract

Digital breast tomosynthesis (DBT) is a powerful imaging modality that allows for improved lesion visibility, characterization, and localization compared to conventional two-dimensional digital mammography. DBT has been increasingly adopted in screening and diagnostic settings globally, particularly for women with dense breast tissue where tissue overlap presents a significant diagnostic challenge. Here we describe DBT-2026, a real world imaging dataset with 558 DBT exams from 558 patients with breast imaging reporting and data system (BI-RADS) scores of 0, 1, or 2. Each case contains one DBT examination in combination with expert annotations and free-text radiology reports that describe the radiological findings, produced in routine clinical practice. To protect patient privacy, all images and reports have been de-identified. The dataset is made freely available to researchers for non-commercial projects to facilitate and encourage research in breast cancer imaging.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Scientific Data
174 papers in training set
Top 0.1%
40.7%
2
Nature Communications
4913 papers in training set
Top 28%
6.5%
3
Diagnostics
48 papers in training set
Top 0.2%
5.0%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 30%
5.0%
5
Scientific Reports
3102 papers in training set
Top 21%
5.0%
6
European Radiology
14 papers in training set
Top 0.3%
2.8%
7
Medical Physics
14 papers in training set
Top 0.3%
2.1%
8
Nature Medicine
117 papers in training set
Top 2%
1.8%
9
Science Translational Medicine
111 papers in training set
Top 3%
1.5%
10
Biomedical Optics Express
84 papers in training set
Top 0.8%
1.4%
11
npj Precision Oncology
48 papers in training set
Top 0.7%
1.4%
12
eBioMedicine
130 papers in training set
Top 2%
1.4%
13
IEEE Transactions on Medical Imaging
18 papers in training set
Top 0.3%
1.3%
14
Patterns
70 papers in training set
Top 2%
1.0%
15
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
16
Neurocomputing
13 papers in training set
Top 0.5%
0.8%
17
Data in Brief
13 papers in training set
Top 0.3%
0.8%
18
Science Advances
1098 papers in training set
Top 27%
0.8%
19
Photoacoustics
11 papers in training set
Top 0.4%
0.8%
20
Expert Systems with Applications
11 papers in training set
Top 0.3%
0.8%
21
Frontiers in Medicine
113 papers in training set
Top 6%
0.8%
22
MethodsX
14 papers in training set
Top 0.4%
0.8%
23
The Lancet Digital Health
25 papers in training set
Top 1%
0.8%
24
IEEE Access
31 papers in training set
Top 0.9%
0.8%
25
SLAS Technology
11 papers in training set
Top 0.2%
0.8%
26
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
27
eLife
5422 papers in training set
Top 58%
0.7%
28
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
29
Communications Medicine
85 papers in training set
Top 1%
0.7%
30
NeuroImage
813 papers in training set
Top 6%
0.7%