Back

MurineCyto-Det: A High-Resolution Murine BALF Cytology Dataset for Leukocyte Segmentation and Detection

Le, T. X.; Tran, L.-A. T.; Farabi, D. A.; Wang, S.; Phan, A. T. Q.; Cormier, S. A.; Taada, A.; McGrew, D.; Du, Y.; Vu, L. D.

2026-05-12 bioinformatics
10.64898/2026.05.08.723893 bioRxiv
Show abstract

Automated analysis of murine bronchoalveolar lavage fluid (BALF) cytology is important for preclinical respiratory research, yet progress has been limited by the lack of publicly available, well-annotated mouse BALF image datasets. We present MurineCyto-Det, a high-resolution murine BALF cytology dataset comprising 333 image tiles of size 1024x1024 pixels, annotated across five cytological categories with both pixel-level segmentation masks and one-to-one matched bounding boxes. The dataset contains 14,551 annotated cell instances and supports two complementary analysis tasks: morphology-oriented cell segmentation and object-level cell detection. To establish reproducible benchmark baselines, we evaluated representative segmentation and detection models. The results demonstrate the practical utility of MurineCyto-Det while highlighting realistic challenges arising from class imbalance, small object size, irregular cell morphology, and ambiguous debris-like structures. MurineCyto-Det provides a standardized resource for developing, evaluating, and comparing automated methods for murine BALF cytology analysis. The dataset is publicly available at https://doi.org/10.5281/zenodo.17608677.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
Cytometry Part A
30 papers in training set
Top 0.1%
10.2%
2
Bioinformatics
1061 papers in training set
Top 3%
9.3%
3
BMC Bioinformatics
383 papers in training set
Top 2%
6.4%
4
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
5
Scientific Reports
3102 papers in training set
Top 23%
4.9%
6
PLOS ONE
4510 papers in training set
Top 38%
3.6%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
8
iScience
1063 papers in training set
Top 7%
2.8%
9
Nature Communications
4913 papers in training set
Top 45%
2.5%
10
Cell Reports Methods
141 papers in training set
Top 2%
1.7%
11
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
50% of probability mass above
12
Scientific Data
174 papers in training set
Top 1%
1.7%
13
GigaScience
172 papers in training set
Top 1%
1.7%
14
Biological Imaging
15 papers in training set
Top 0.1%
1.5%
15
Frontiers in Bioinformatics
45 papers in training set
Top 0.3%
1.3%
16
Journal of Pathology Informatics
13 papers in training set
Top 0.2%
1.3%
17
Clinical Chemistry
22 papers in training set
Top 0.5%
1.2%
18
Genome Medicine
154 papers in training set
Top 6%
1.2%
19
Disease Models & Mechanisms
119 papers in training set
Top 2%
1.2%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.0%
21
npj Precision Oncology
48 papers in training set
Top 1%
0.9%
22
Nature Methods
336 papers in training set
Top 6%
0.9%
23
BMC Biology
248 papers in training set
Top 3%
0.9%
24
Communications Biology
886 papers in training set
Top 21%
0.8%
25
Advanced Science
249 papers in training set
Top 18%
0.8%
26
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 8%
0.8%
27
BMC Methods
11 papers in training set
Top 0.2%
0.8%
28
Journal of Cell Science
353 papers in training set
Top 2%
0.8%
29
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
30
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%