Back

Cross-assay RNA modeling reveals cancer biomarkers

Townsend, H. A.; Jordan, K. R.; Wolsky, R. J.; Van Kleunen, L. B.; Davidson, N. R.; Behbakht, K.; Sikora, M. J.; Dowell, R. D.; Clauset, A.; Bitler, B. G.

2026-05-05 bioinformatics
10.64898/2026.04.30.722009 bioRxiv
Show abstract

The clinical heterogeneity of cancer poses a major challenge for precision medicine. Limited cohort sizes across evolving assay platforms impede reliable biomarker discovery. Here, we systematically evaluate how to integrate data from four transcriptomics platforms: bulk and single-cell (sc) RNA sequencing (RNA-seq), NanoString, and microarray for predictive modeling in cancer. We use high-grade serous carcinoma (HGSC) of tube-ovarian origin as a model system, as it is highly heterogeneous in both biology and assay data. We find that using fold-change of gene expression in patients with matched pre- and post-neoadjuvant chemotherapy samples reduces inter-patient and inter-assay variability but is insufficient to overcome platform-specific biases. Microarray and scRNA-seq data exhibit systematic biases, while RNA-seq and NanoString show the most promise for combination into a single training cohort. To mitigate inter-assay limitations, we generate a new data set of HGSC tumor samples profiled with both RNA-seq and NanoString, and use it to identify the limits of detection and optimal harmonization strategies. Our approaches enable integration of cohorts for separate and combined RNA-seq and NanoString predictive models of disease recurrence (test-set AUROCs > 0.8), validated in external microarray cohorts. We leverage single-cell and bulk RNA-seq network-based analyses to provide mechanistic context for genes in the predictive models. Our models indicate that GBP4 expression is a key predictor of recurrence and marks immune remodeling towards cytotoxicity. We provide an interactive web portal to facilitate exploration of data and results. These findings guide cross-assay harmonization of transcriptomic data and enable improved predictive modeling in heterogeneous cancers. Statement of SignificanceWe present a framework for integrating RNA-seq, NanoString, microarray, and single-cell transcriptomic data for predictive modeling, enabling robust biomarker discovery in heterogeneous cancers and identifying GBP4 as a marker of immune remodeling.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.2%
22.5%
2
Nature Communications
4913 papers in training set
Top 18%
10.1%
3
Cell Genomics
162 papers in training set
Top 0.3%
8.4%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.4%
5
Genome Medicine
154 papers in training set
Top 2%
4.3%
50% of probability mass above
6
Cancer Research Communications
46 papers in training set
Top 0.1%
3.6%
7
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
8
Cell Reports Medicine
140 papers in training set
Top 2%
3.1%
9
npj Precision Oncology
48 papers in training set
Top 0.2%
2.9%
10
Nature Biotechnology
147 papers in training set
Top 3%
2.7%
11
Cell Reports Methods
141 papers in training set
Top 2%
1.8%
12
Scientific Reports
3102 papers in training set
Top 58%
1.7%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
14
Cancer Research
116 papers in training set
Top 2%
1.5%
15
eLife
5422 papers in training set
Top 47%
1.3%
16
Genome Biology
555 papers in training set
Top 5%
1.3%
17
Bioinformatics
1061 papers in training set
Top 8%
1.2%
18
npj Systems Biology and Applications
99 papers in training set
Top 2%
1.1%
19
JCI Insight
241 papers in training set
Top 5%
1.1%
20
PLOS ONE
4510 papers in training set
Top 62%
0.9%
21
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
22
Cancer Cell
38 papers in training set
Top 2%
0.8%
23
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
24
Science Advances
1098 papers in training set
Top 28%
0.8%
25
iScience
1063 papers in training set
Top 32%
0.7%
26
Nature Genetics
240 papers in training set
Top 8%
0.7%
27
Molecular Systems Biology
142 papers in training set
Top 2%
0.6%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%
29
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%