Back

Connecting spatial regions to clinical phenotypes by transferring knowledge from bulk patient data

Bayarbaatar, A.; Hodzic, E.; Kesimoglu, Z. N.; Hirsch, M. G.; Bridgers, J. D.; Hoinka, J.; Levens, D.; Day, C.-P.; Przytycka, T. M.

2025-12-12 cancer biology
10.64898/2025.12.12.693322 bioRxiv
Show abstract

Spatially resolved transcriptomics (SRT) technology has enabled a new level of knowledge about tumors. Many critical tumor properties, such as invasiveness and growth, depend on both specific transcriptomic changes in tumor cells and the tumor microenvironment. However, computational methods to study clinical phenotypes of spatial regions, such as hazard and drug response, have not yet been developed. Since clinical phenotypes are measured at the patient level and not at the level of spatial regions, such a method would require transferring knowledge from the patient level domain to the spot level domain. To overcome this challenge, we developed SpacePhenotyper. Our approach uses algebraic spectral techniques to transfer the predictive relationship between gene expression and clinical phenotypes from bulk gene expression data to SRT data. Our approach captures a gene expression pattern that is predictive of the phenotype of interest in the form of a vector, the "Eigen-Patient," which is then used to quantify the phenotype in spatial spots. After extensively validating SpacePhenotyper on simulated and real data, we utilize it to study how the spatial heterogeneity of breast cancer tumors influences residual cancer burden after treatment. By assigning relative quantities of clinical phenotypes to spatial locations, SpacePhenotyper has proven a powerful tool for the identification and interpretation of transcriptional changes over spatial regions and of spatially-regulated patterns of cellular states. SpacePhenotyper is implemented in Python. The source code and data sets used for and generated during this study are available at https://github.com/ncbi/SpacePhenotyper

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
23.1%
2
PLOS Computational Biology
1633 papers in training set
Top 4%
8.6%
3
Cancer Research
116 papers in training set
Top 0.1%
7.3%
4
Cell Systems
167 papers in training set
Top 2%
6.5%
5
PLOS ONE
4510 papers in training set
Top 30%
5.0%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 34%
3.7%
7
Bioinformatics Advances
184 papers in training set
Top 1%
3.7%
8
Nature Communications
4913 papers in training set
Top 38%
3.7%
9
Biostatistics
21 papers in training set
Top 0.1%
2.9%
10
iScience
1063 papers in training set
Top 7%
2.8%
11
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
12
npj Systems Biology and Applications
99 papers in training set
Top 0.9%
1.9%
13
Patterns
70 papers in training set
Top 0.6%
1.9%
14
Communications Biology
886 papers in training set
Top 8%
1.7%
15
Cancers
200 papers in training set
Top 3%
1.5%
16
eLife
5422 papers in training set
Top 45%
1.5%
17
Genome Medicine
154 papers in training set
Top 5%
1.4%
18
Genome Biology
555 papers in training set
Top 6%
1.1%
19
Genome Research
409 papers in training set
Top 3%
1.0%
20
Frontiers in Bioinformatics
45 papers in training set
Top 0.7%
0.8%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.8%
22
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
24
Journal of Computational Biology
37 papers in training set
Top 0.6%
0.8%
25
PLOS Genetics
756 papers in training set
Top 15%
0.7%
26
Biological Imaging
15 papers in training set
Top 0.3%
0.7%
27
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%
28
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%
29
Nature Methods
336 papers in training set
Top 7%
0.5%
30
Breast Cancer Research
32 papers in training set
Top 0.6%
0.5%