Back

RNABag: A Generalizable Transcriptome Foundation Model for Precision Oncology across Biopsy Modalities

Luo, P.; Luo, D.; Li, D.; Xue, X.; Yang, J.; Gong, X.; Tang, K.

2026-04-22 bioinformatics
10.64898/2026.04.19.719450 bioRxiv
Show abstract

Transcriptomic data is highly sensitive to cancer state and progression, making transcriptome-based foundation models a great promise for diverse clinical ontological inference. However, analyses of transcriptome are conventionally hindered by technical batch effects and limited generalization across platforms. Here, we introduce RNABag, a foundation model designed to generalize well to external datasets. In particular, the model only focuses on highly variable genes to reduce noise; and extensive data augmentation was utilized to pretrain RNABag to learn robust representations, invariant to batch variations. We demonstrate that RNABag achieves superior performance in pan-cancer tissue-of-origin classification and cancer detection in internal validation sets, as well as in zero-shot generalization to external cohorts and in-house clinical samples. Furthermore, RNABag, after specialized finetuning, exhibits strong capabilities in a wide range of clinical applications. The model effectively stratifies patient survival and predicts relapse risks, highlighting key molecular pathways driving tumor progression. Crucially, we extend RNABags utility to liquid biopsies, achieving high diagnostic accuracy in plasma cfRNA and tumor-educated platelets (TEPs), thereby supporting its application in non-invasive cancer monitoring. Interpretability analysis revealed pivotal role of tumor immune escape in the cancer induced plasma cfRNA signals. In summary, our study indicates that cancer states and progression may be monitored in details and precision via comprehensive modeling of transcriptome across biopsy modalities.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 0.6%
14.4%
2
Scientific Reports
3102 papers in training set
Top 24%
4.9%
3
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.2%
4
Nature Communications
4913 papers in training set
Top 37%
4.0%
5
Genome Medicine
154 papers in training set
Top 2%
4.0%
6
Cell Reports Medicine
140 papers in training set
Top 1%
3.6%
7
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.5%
3.6%
8
npj Precision Oncology
48 papers in training set
Top 0.2%
3.6%
9
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
10
Briefings in Bioinformatics
326 papers in training set
Top 2%
2.7%
11
Bioinformatics
1061 papers in training set
Top 6%
2.5%
50% of probability mass above
12
npj Systems Biology and Applications
99 papers in training set
Top 0.8%
2.4%
13
Small Methods
26 papers in training set
Top 0.2%
2.1%
14
Cancer Research Communications
46 papers in training set
Top 0.3%
2.1%
15
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.9%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
17
Cancer Research
116 papers in training set
Top 2%
1.7%
18
iScience
1063 papers in training set
Top 18%
1.5%
19
PLOS ONE
4510 papers in training set
Top 56%
1.5%
20
Patterns
70 papers in training set
Top 1%
1.5%
21
Cancer Cell
38 papers in training set
Top 1%
1.3%
22
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
23
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.2%
24
npj Digital Medicine
97 papers in training set
Top 3%
1.1%
25
Frontiers in Genetics
197 papers in training set
Top 7%
1.0%
26
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.0%
27
Journal of Translational Medicine
46 papers in training set
Top 2%
0.8%
28
Communications Biology
886 papers in training set
Top 24%
0.7%
29
Cancers
200 papers in training set
Top 5%
0.7%
30
Science Advances
1098 papers in training set
Top 30%
0.7%