Back

Transcriptomic profiling of mouse mammary tumors enables prognostic and predictive biomarker discovery for human breast cancers

Sutcliffe, M. D.; Mott, K. R.; Yilmaz-Swenson, T.; Felsheim, B. M.; Lobanov, A. V.; Michmerhuizen, A. R.; Raedler, P. D.; Okumu, D. O.; He, X.; Pfefferle, A. D.; Dance-Barnes, S.; East, M. P.; Hollern, D. P.; Elston, T. C.; Johnson, G. L.; Perou, C. M.

2026-03-03 cancer biology
10.64898/2026.02.28.707759 bioRxiv
Show abstract

The development and validation of prognostic and predictive biomarkers in breast cancer is limited by the availability of well-annotated datasets linking tumor molecular features to treatment response and survival outcomes. To address this need, we generated an extensive mouse models dataset comprised of 26 immunocompetent mammary tumor models spanning diverse genetic backgrounds, epithelial-mesenchymal states, the basal-luminal axis, and distinct immune microenvironments. For each model, we measured survival under no treatment, immune checkpoint inhibition (ICI), and carboplatin/paclitaxel chemotherapy. We performed RNA-seq on baseline tumors and on 7-day on-treatment samples for both regimens. Using baseline murine tumor gene expression features, we trained a machine learning Elastic Net model that predicted survival outcomes on multiple human breast cancer datasets with performance comparable to that of existing prognostic assays. We next trained models for ICI benefit, using either the untreated or 7-day ICI treated samples; both models predicted ICI benefit on human ICI treated datasets, with the 7-day treated tumor model showing better performance. We also developed a predictor of carboplatin/paclitaxel response that performed well in mice but did not generalize to human chemotherapy cohorts. Finally, we compared multiple computational approaches, including XGBoost, random forests, and support vector regression; all methods successfully predicted survival outcomes, with Elastic Net offering the best performance and interpretability. These results indicate conserved cancer biology between mouse and human tumors for prognosis and ICI response and establish this large preclinical dataset with linked phenotypic and genomic data, as a resource for benchmarking computational methods for survival prediction. SignificanceThe development of a genomically and phenotypically diverse murine tumor dataset with linked treatment outcomes establishes a robust translational resource to develop, test, and benchmark clinically relevant prognostic and therapeutic response biomarkers.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Breast Cancer Research
32 papers in training set
Top 0.1%
18.7%
2
Nature Communications
4913 papers in training set
Top 28%
6.4%
3
Cell Reports Medicine
140 papers in training set
Top 0.4%
6.4%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
5
Clinical Cancer Research
58 papers in training set
Top 0.3%
4.8%
6
Scientific Reports
3102 papers in training set
Top 24%
4.8%
7
Cancers
200 papers in training set
Top 1%
4.3%
50% of probability mass above
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 18%
4.0%
9
Cancer Research
116 papers in training set
Top 0.6%
4.0%
10
npj Breast Cancer
18 papers in training set
Top 0.1%
3.6%
11
Nature Cancer
35 papers in training set
Top 0.4%
3.1%
12
Genome Medicine
154 papers in training set
Top 3%
2.9%
13
Cancer Research Communications
46 papers in training set
Top 0.2%
2.4%
14
PLOS ONE
4510 papers in training set
Top 48%
2.1%
15
Science Advances
1098 papers in training set
Top 17%
1.7%
16
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.5%
1.7%
17
npj Precision Oncology
48 papers in training set
Top 0.7%
1.3%
18
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.3%
19
Cancer Discovery
61 papers in training set
Top 1%
1.2%
20
eLife
5422 papers in training set
Top 49%
1.2%
21
Cancer Cell
38 papers in training set
Top 1%
1.1%
22
Journal for ImmunoTherapy of Cancer
64 papers in training set
Top 0.8%
1.1%
23
Cell Systems
167 papers in training set
Top 10%
0.9%
24
Annals of Oncology
13 papers in training set
Top 0.9%
0.8%
25
Cell Reports
1338 papers in training set
Top 33%
0.7%
26
Communications Biology
886 papers in training set
Top 29%
0.6%