Back

GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

Kapse, S.; Aygün, M.; Cole, E.; Lundberg, E.; Song, L.; Xing, E. P.

2026-03-20 bioinformatics
10.64898/2026.03.17.712534 bioRxiv
Show abstract

Recent advancements in histopathology foundation models (FMs) have largely been driven by scaling the training data, often utilizing massive proprietary datasets. However, the long-tailed distribution of morphological features in whole-slide images (WSIs) makes simple scaling inefficient, as common morphologies dominate the learning signal. We introduce GenBio-PathFM, a 1.1B-parameter FM that achieves state-of-the-art performance on public benchmarks while using a fraction of the training data required by current leading models. The efficiency of GenBio-PathFM is underpinned by two primary innovations: an automated data curation pipeline that prioritizes morphological diversity and a novel dual-stage learning strategy which we term JEDI (JEPA + DINO). Across the THUNDER, HEST, and PathoROB benchmarks, GenBio-PathFM demonstrates state-of-the-art accuracy and robustness. GenBio-PathFM is the strongest open-weight model to date and the only state-of-the-art model trained exclusively on public data.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 1.0%
10.1%
2
Bioinformatics
1061 papers in training set
Top 3%
9.1%
3
Nature Communications
4913 papers in training set
Top 22%
8.4%
4
Cell Systems
167 papers in training set
Top 2%
6.3%
5
PLOS ONE
4510 papers in training set
Top 35%
4.2%
6
Communications Biology
886 papers in training set
Top 1%
4.0%
7
Nature Machine Intelligence
61 papers in training set
Top 0.9%
3.6%
8
Nature Medicine
117 papers in training set
Top 1%
2.9%
9
Nature Biotechnology
147 papers in training set
Top 3%
2.7%
50% of probability mass above
10
Advanced Science
249 papers in training set
Top 7%
2.7%
11
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
12
Scientific Reports
3102 papers in training set
Top 46%
2.6%
13
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
14
Genome Biology
555 papers in training set
Top 4%
1.8%
15
iScience
1063 papers in training set
Top 15%
1.7%
16
Science
429 papers in training set
Top 14%
1.7%
17
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 33%
1.7%
19
npj Precision Oncology
48 papers in training set
Top 0.8%
1.3%
20
Genome Medicine
154 papers in training set
Top 5%
1.3%
21
Science Advances
1098 papers in training set
Top 21%
1.3%
22
BMC Bioinformatics
383 papers in training set
Top 6%
1.1%
23
Cell Reports Medicine
140 papers in training set
Top 6%
0.9%
24
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.9%
25
PNAS Nexus
147 papers in training set
Top 1%
0.8%
26
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
27
Genome Research
409 papers in training set
Top 4%
0.7%
28
Journal of Structural Biology
58 papers in training set
Top 2%
0.7%
29
New Phytologist
309 papers in training set
Top 5%
0.7%
30
eLife
5422 papers in training set
Top 61%
0.6%