Back

Translating Histopathology Foundation Model Embeddings into Cellular and Molecular Features for Clinical Studies

Cui, S.; Sui, Z.; Li, Z.; Matkowskyj, K. A.; Yu, M.; Grady, W. M.; Sun, W.

2026-03-19 bioinformatics
10.64898/2026.03.17.711896 bioRxiv
Show abstract

AI-powered pathology foundation models provide general-purpose representations of histopathological images by encoding image tiles into numerical embeddings. However, these embeddings are not directly interpretable in biological or clinical terms and must be translated into biologically meaningful features, such as cell-type composition or gene expression, to enable downstream clinical applications. To bridge this gap, we developed STpath, a framework that integrates histopathology image embeddings derived from existing pathology foundation models with matched, spatially resolved transcriptomics data. STpath consists of cancer-specific XGBoost models trained to infer cell-type compositions and gene expression from histopathology image tiles. We evaluated STpath in colorectal and breast cancer datasets and showed that it provides accurate estimates of the composition of major cell types and the expression of a subset of genes, with further performance gains achieved by combining embeddings from multiple foundation models. Finally, we demonstrated that STpath inferred features that can be used in downstream studies to evaluate their associations with clinical outcomes.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 14%
12.2%
2
Cell Systems
167 papers in training set
Top 1%
8.2%
3
Bioinformatics
1061 papers in training set
Top 4%
6.2%
4
Nature Methods
336 papers in training set
Top 2%
4.7%
5
Genome Biology
555 papers in training set
Top 2%
3.5%
6
Genome Medicine
154 papers in training set
Top 2%
3.5%
7
Nature Machine Intelligence
61 papers in training set
Top 1%
3.5%
8
Advanced Science
249 papers in training set
Top 6%
3.5%
9
Scientific Reports
3102 papers in training set
Top 42%
3.0%
10
Nucleic Acids Research
1128 papers in training set
Top 7%
3.0%
50% of probability mass above
11
Nature Medicine
117 papers in training set
Top 1%
2.5%
12
Nature Biotechnology
147 papers in training set
Top 3%
2.5%
13
PLOS Computational Biology
1633 papers in training set
Top 13%
2.3%
14
Cancer Research
116 papers in training set
Top 2%
2.0%
15
iScience
1063 papers in training set
Top 12%
1.8%
16
Nature Biomedical Engineering
42 papers in training set
Top 0.7%
1.8%
17
Genome Research
409 papers in training set
Top 2%
1.7%
18
Science Advances
1098 papers in training set
Top 16%
1.7%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 33%
1.7%
20
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
21
Patterns
70 papers in training set
Top 1%
1.5%
22
PLOS ONE
4510 papers in training set
Top 57%
1.5%
23
Cell Reports Medicine
140 papers in training set
Top 5%
1.3%
24
Communications Biology
886 papers in training set
Top 15%
1.2%
25
npj Precision Oncology
48 papers in training set
Top 0.9%
1.2%
26
eLife
5422 papers in training set
Top 50%
1.2%
27
Cell
370 papers in training set
Top 15%
0.9%
28
Nature Cell Biology
99 papers in training set
Top 4%
0.8%
29
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
30
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%