Back

MDMR: Balancing Diversity and Redundancy for Annotation-Efficient Fine-Tuning of Pretrained Cell Segmentation Models

Sheikh, E. M.; Tharwat, A.; Schwan, C.; Schenck, W.

2025-11-05 bioengineering
10.1101/2025.11.04.686267 bioRxiv
Show abstract

Pretrained cell segmentation models have simplified and accelerated microscopy image analysis, but they often perform poorly on new and challenging datasets. Although these models can be adapted to new datasets with only a few annotated images, the effectiveness of fine-tuning depends critically on which images are selected for annotation. To address this, we propose CGMD (Centrality-Guided Maximum Diversity), a novel algorithm that identifies a small set of images that are maximally diverse with respect to each other in the pretrained feature space. We evaluate CGMD under an extremely low annotation budget of just two images per dataset for fine-tuning the pretrained Cellpose Cyto2 model on four different 2D+t datasets from the Cell Tracking Challenge. CGMD consistently outperforms six competitive active learning and subset selection methods and approaches the performance of fully supervised fine-tuning. The results show that centrality-guided maximum diversity subset selection enables stable and annotation-efficient fine-tuning of pretrained cell segmentation models. The code is publicly available at: https://github.com/eiram-mahera/cgmd.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.1%
54.6%
50% of probability mass above
2
Nature Communications
4913 papers in training set
Top 36%
4.2%
3
ACS Photonics
13 papers in training set
Top 0.1%
3.8%
4
Cell Systems
167 papers in training set
Top 3%
3.8%
5
Communications Biology
886 papers in training set
Top 1%
3.8%
6
IEEE Transactions on Medical Imaging
18 papers in training set
Top 0.2%
2.7%
7
PLOS ONE
4510 papers in training set
Top 47%
2.2%
8
Nature Machine Intelligence
61 papers in training set
Top 1%
2.2%
9
Journal of Cell Biology
333 papers in training set
Top 2%
1.9%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.8%
11
Light: Science & Applications
16 papers in training set
Top 0.3%
1.8%
12
Bioinformatics
1061 papers in training set
Top 8%
1.4%
13
Science
429 papers in training set
Top 17%
1.2%
14
Advanced Science
249 papers in training set
Top 16%
0.9%
15
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
16
ACS Nano
99 papers in training set
Top 3%
0.8%
17
Scientific Reports
3102 papers in training set
Top 73%
0.8%
18
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
19
Cell Reports
1338 papers in training set
Top 36%
0.5%
20
Science Advances
1098 papers in training set
Top 34%
0.5%
21
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.5%
22
Nano Letters
63 papers in training set
Top 3%
0.5%
23
eLife
5422 papers in training set
Top 62%
0.5%