RNABag: A Generalizable Transcriptome Foundation Model for Precision Oncology across Biopsy Modalities
Luo, P.; Luo, D.; Li, D.; Xue, X.; Yang, J.; Gong, X.; Tang, K.
Show abstract
Transcriptomic data is highly sensitive to cancer state and progression, making transcriptome-based foundation models a great promise for diverse clinical ontological inference. However, analyses of transcriptome are conventionally hindered by technical batch effects and limited generalization across platforms. Here, we introduce RNABag, a foundation model designed to generalize well to external datasets. In particular, the model only focuses on highly variable genes to reduce noise; and extensive data augmentation was utilized to pretrain RNABag to learn robust representations, invariant to batch variations. We demonstrate that RNABag achieves superior performance in pan-cancer tissue-of-origin classification and cancer detection in internal validation sets, as well as in zero-shot generalization to external cohorts and in-house clinical samples. Furthermore, RNABag, after specialized finetuning, exhibits strong capabilities in a wide range of clinical applications. The model effectively stratifies patient survival and predicts relapse risks, highlighting key molecular pathways driving tumor progression. Crucially, we extend RNABags utility to liquid biopsies, achieving high diagnostic accuracy in plasma cfRNA and tumor-educated platelets (TEPs), thereby supporting its application in non-invasive cancer monitoring. Interpretability analysis revealed pivotal role of tumor immune escape in the cancer induced plasma cfRNA signals. In summary, our study indicates that cancer states and progression may be monitored in details and precision via comprehensive modeling of transcriptome across biopsy modalities.
Matching journals
The top 11 journals account for 50% of the predicted probability mass.