Back

Weak supervision of H&E slides reveals systems-level biology and functional states that govern therapeutic resistance

Goncalves, T.; Pulido, D.; Perrino, C. M.; Lomphithak, T.; Cleveland, M.; Dalca, A. V.; Gerstner, E.; Hipp, J.; Patel, J. B.; Rosen, B.; Sirintrapun, S. J.; Wander, S. A.; Parwani, A.; Tozbikian, G.; Niazi, M. K. K.; Cardoso, J.; Brock, J.; Zanfagnin, V.; Gazzaniga, F.; Iafrate, A. J.; Flaherty, K. T.; Sgroi, D. C.; Guttag, J. V.; Bridge, C. P.; Kim, A. E.

2026-05-08 biophysics
10.64898/2026.05.05.723013 bioRxiv
Show abstract

Precision oncology lacks scalable tools to assess, at the patient level, systems-level tumor microenvironment (TME) programs driving therapeutic resistance. To address this gap, we trained a weakly-supervised deep learning model that uses routine H&E whole-slide images (WSIs) to derive quantitative activity for therapeutically-relevant TME phenotypes, spanning immune, metabolic, and tumor cell-intrinsic programs. Using 3111 breast cancer H&E WSIs with matched bulk transcriptomics, our model accurately infers these biological states, defined by pathway enrichment scores (AUROC>0.80; PCC>0.64). Validation spanned three levels: (i) tissue-matched multiplexed immunofluorescence, showing concordance between inferred functional states and immune cell fractions (p=0.006-0.106), (ii) blinded reader assessments, confirming localization of phenotype-specific morphology (p<3x10-5), and (iii) multi-institutional patient cohorts, where model-derived phenotypes stratified for clinical response (p<0.045). Unlike methods requiring resource-intensive spatial profiling data for training, our approach leverages widely-available therapeutic outcomes or bulk profiling as slide-level labels to assess functional biology. This strategy offers a scalable complement to spatial Omics for investigating therapeutic resistance across the pan-cancer landscape through using WSIs and clinical outcomes from massive legacy biobanks.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Cancer Research
116 papers in training set
Top 0.1%
14.1%
2
Nature Communications
4913 papers in training set
Top 12%
14.1%
3
Cell Systems
167 papers in training set
Top 2%
6.3%
4
Nature Biomedical Engineering
42 papers in training set
Top 0.2%
4.3%
5
Science Advances
1098 papers in training set
Top 6%
3.5%
6
Nature Cancer
35 papers in training set
Top 0.3%
3.5%
7
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 23%
3.0%
8
Advanced Science
249 papers in training set
Top 7%
3.0%
50% of probability mass above
9
Cell Reports
1338 papers in training set
Top 17%
3.0%
10
Nature Methods
336 papers in training set
Top 3%
2.8%
11
eLife
5422 papers in training set
Top 30%
2.8%
12
Clinical Cancer Research
58 papers in training set
Top 0.7%
2.6%
13
Nature Medicine
117 papers in training set
Top 2%
1.9%
14
Nature
575 papers in training set
Top 10%
1.9%
15
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
16
Scientific Reports
3102 papers in training set
Top 59%
1.7%
17
EMBO Molecular Medicine
85 papers in training set
Top 2%
1.7%
18
Cell
370 papers in training set
Top 12%
1.6%
19
Communications Biology
886 papers in training set
Top 11%
1.5%
20
PLOS ONE
4510 papers in training set
Top 60%
1.2%
21
Cell Reports Medicine
140 papers in training set
Top 6%
1.2%
22
Genome Medicine
154 papers in training set
Top 6%
1.2%
23
Nature Biotechnology
147 papers in training set
Top 6%
1.2%
24
iScience
1063 papers in training set
Top 25%
0.9%
25
Cancer Discovery
61 papers in training set
Top 2%
0.9%
26
Nature Cell Biology
99 papers in training set
Top 4%
0.9%
27
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
28
Science
429 papers in training set
Top 19%
0.8%
29
Cancer Cell
38 papers in training set
Top 2%
0.8%
30
npj Digital Medicine
97 papers in training set
Top 4%
0.7%