Back

A comprehensive benchmark of publicly available image foundation models for their usability to predict gene expression from whole slide images

Jabin, A.; Ahmad, S.

2026-03-03 bioinformatics
10.64898/2026.03.02.709012 bioRxiv
Show abstract

Recent advances in large-scale self-supervised learning have led to the emergence of foundation models capable of extracting transferable visual representations from high-dimensional image data. In computational pathology, such models are increasingly used as feature encoders for molecular prediction tasks. However, systematic benchmarking of publicly available image foundation models for transcriptomic prediction from whole-slide images (WSIs) remains limited. Here, we perform a comprehensive evaluation of five state-of-the-art vision foundation models-DINOv2, Phikon, UNI, H-Optimus-0, and MedSigLIP-for gene expression prediction using the TCGA-BRCA cohort. Tile embeddings extracted from each model were aggregated via attention-based multiple instance learning (MIL), followed by multi-target regression to predict RNA-seq expression profiles. Performance was assessed using gene-level Spearman correlation across samples. Histopathology-specific foundation models consistently outperformed general-purpose encoders, with Phikon achieving the strongest overall performance, followed by UNI and H-Optimus-0. These findings demonstrate that domain-aligned pretraining substantially enhances morphology-to-transcriptome inference and provide a principled benchmark for foundation model selection in molecular pathology.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
14.3%
2
Bioinformatics
1061 papers in training set
Top 3%
7.2%
3
Nature Communications
4913 papers in training set
Top 28%
6.4%
4
Genome Medicine
154 papers in training set
Top 1.0%
6.4%
5
Advanced Science
249 papers in training set
Top 3%
4.8%
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.4%
4.3%
7
Scientific Reports
3102 papers in training set
Top 31%
4.0%
8
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
50% of probability mass above
9
Cell Reports Medicine
140 papers in training set
Top 1%
3.6%
10
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
11
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
12
BMC Bioinformatics
383 papers in training set
Top 3%
2.6%
13
Nature Methods
336 papers in training set
Top 3%
2.4%
14
Communications Biology
886 papers in training set
Top 5%
2.1%
15
Medical Image Analysis
33 papers in training set
Top 0.6%
1.7%
16
eBioMedicine
130 papers in training set
Top 1%
1.7%
17
Patterns
70 papers in training set
Top 1.0%
1.7%
18
GigaScience
172 papers in training set
Top 2%
1.1%
19
Frontiers in Bioinformatics
45 papers in training set
Top 0.6%
0.9%
20
npj Precision Oncology
48 papers in training set
Top 1%
0.9%
21
iScience
1063 papers in training set
Top 27%
0.9%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.9%
23
New Phytologist
309 papers in training set
Top 4%
0.8%
24
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
25
Journal of Translational Medicine
46 papers in training set
Top 2%
0.8%
26
Science Advances
1098 papers in training set
Top 28%
0.8%
27
Cell Systems
167 papers in training set
Top 12%
0.7%
28
PNAS Nexus
147 papers in training set
Top 2%
0.7%
29
PLOS ONE
4510 papers in training set
Top 71%
0.6%
30
Breast Cancer Research
32 papers in training set
Top 0.6%
0.6%