Back

evoCancerGPT: Generating Zero-Shot Single-Cell and Single-Sample Cancer Progression Through Transfer Learning

Wang, X.; Tan, R.; Cristea, S.

2026-02-14 bioinformatics
10.64898/2026.02.12.705621 bioRxiv
Show abstract

Cancer evolution is driven by complex changes in gene expression as cells transition and change states during tumorigenesis. Single-cell RNA sequencing has provided snapshot insights into how the transcriptomics of tumors evolve, but whether the existing knowledge can be used to reliably learn and generate the patterns behind the evolution of cancers remains unknown. Here, we introduce evoCancerGPT, a generative pre-trained transformer decoder-only single-cell foundation model designed to forecast future gene expression profiles in cancer evolution by leveraging previous cell states at the level of single patients. This model integrates the continuous gene expression data of each cell to create a comprehensive representation of a cell token. Training sentences are constructed for each cancer type, each patient and each cell type separately, ordered via inferred pseudotime algorithms, using 2.76 million cell tokens, each with 12,639 genes, spanning 7 cancer types. By learning from long-range dependencies between cells arranged in pseudotime from a large corpus of data, evoCancerGPT captures key transitions in cancer evolution, achieving high concordance to ground truth trajectories and outperforming linear and scGPT baselines in held-out test samples in low-context scenarios. Our work suggests evoCancerGPTs potential utility in characterizing tumor progression at a single-cell and single-patient level and ultimately contributing to more personalized cancer care.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
17.7%
2
Nature Communications
4913 papers in training set
Top 15%
12.0%
3
Cell Systems
167 papers in training set
Top 1%
9.8%
4
Cell Genomics
162 papers in training set
Top 0.9%
4.7%
5
Nature Methods
336 papers in training set
Top 2%
4.7%
6
Genome Biology
555 papers in training set
Top 2%
4.7%
50% of probability mass above
7
Nature Biotechnology
147 papers in training set
Top 2%
4.0%
8
Nature
575 papers in training set
Top 6%
3.8%
9
Nature Medicine
117 papers in training set
Top 1%
3.1%
10
Genome Medicine
154 papers in training set
Top 3%
2.6%
11
PLOS Computational Biology
1633 papers in training set
Top 14%
2.0%
12
Advanced Science
249 papers in training set
Top 10%
1.8%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.8%
14
Nucleic Acids Research
1128 papers in training set
Top 10%
1.8%
15
Bioinformatics
1061 papers in training set
Top 7%
1.6%
16
Genome Research
409 papers in training set
Top 2%
1.6%
17
Nature Biomedical Engineering
42 papers in training set
Top 0.9%
1.6%
18
Nature Genetics
240 papers in training set
Top 5%
1.3%
19
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
20
Cell Reports Medicine
140 papers in training set
Top 5%
1.2%
21
Scientific Reports
3102 papers in training set
Top 67%
1.2%
22
Nature Computational Science
50 papers in training set
Top 2%
0.7%
23
Cell Reports
1338 papers in training set
Top 34%
0.7%
24
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
25
PLOS ONE
4510 papers in training set
Top 70%
0.7%
26
Science Advances
1098 papers in training set
Top 32%
0.7%
27
Science
429 papers in training set
Top 22%
0.6%