Back

A generative reference grammar of healthy TCR repertoires reveals cancer-associated immune remodeling

Balan, A.; Elhanati, Y.; Meza Landeros, K. E.; Mendes, M. D. A.; Lai, J.; Zaidi, S. S. A.; Unal, M.; Kim, B. Y. S.; Lucas, C.-H. G.; Runco, E.; Puduvalli, V. K.; Gantchev, J.; Whittaker, C. A.; Sharma, P.; Tabar, V.; Cima, M. J.; Baquer, G.; Reardon, D. A.; Stortchevoi, A.; Boire, A.; Wang, L.; White, F. M.; Sidiropoulos, D. N.; Yu, K. K. H.; Chiocca, E. A.; Anagnostou, V.; Data Science Teamlab, ; Accelerating GBM Therapies TeamLab, ; Karchin, R.

2026-05-04 cancer biology
10.64898/2026.04.29.721631 bioRxiv
Show abstract

T-cell receptor (TCR) repertoires encode the organization of adaptive immunity and its reshaping by cancer and therapy, but disentangling treatment-associated structure from V(D)J recombination constraints remains challenging. We present CRAFT (Cancer Repertoire Anomaly Finding Transformer), a conditional sequence-to-sequence transformer that learns a nucleotide-level generative grammar of productive TCR-beta CDR3 sequences from healthy-donor repertoires, conditioned on germline V(D)J assignments. A dual-head decoder mirrors the independence of V-D and D-J recombination, and curriculum training yields embeddings that serve as a reference coordinate system for quantifying structured deviations in cancer-associated repertoires. In proof-of-concept analyses of a checkpoint blockade cohort (n=18) and a two-patient single-cell study of oncolytic immunotherapy, CRAFT-derived geometric metrics capture response-associated immune remodeling, including longitudinal shifts in repertoire organization. In antigen-labeled benchmarks, CRAFT yields coherent organization across specificity classes while highlighting settings where CDR3-beta alone provides partial signal.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature
575 papers in training set
Top 2%
14.5%
2
Nature Cancer
35 papers in training set
Top 0.1%
8.3%
3
Science
429 papers in training set
Top 5%
6.7%
4
Cell Systems
167 papers in training set
Top 2%
6.7%
5
Nature Communications
4913 papers in training set
Top 30%
6.2%
6
Nature Medicine
117 papers in training set
Top 0.5%
4.8%
7
Nature Genetics
240 papers in training set
Top 2%
3.6%
50% of probability mass above
8
Cell Reports
1338 papers in training set
Top 15%
3.5%
9
Cancer Discovery
61 papers in training set
Top 0.7%
3.0%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.8%
11
Nature Neuroscience
216 papers in training set
Top 3%
2.7%
12
Nature Cell Biology
99 papers in training set
Top 2%
2.4%
13
Cancer Research
116 papers in training set
Top 2%
2.1%
14
Science Translational Medicine
111 papers in training set
Top 2%
1.9%
15
Cell Genomics
162 papers in training set
Top 3%
1.8%
16
Genome Medicine
154 papers in training set
Top 5%
1.7%
17
Cell Reports Medicine
140 papers in training set
Top 4%
1.6%
18
Nature Biotechnology
147 papers in training set
Top 6%
1.3%
19
Nature Physics
39 papers in training set
Top 0.8%
1.3%
20
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
21
eLife
5422 papers in training set
Top 51%
1.1%
22
Nature Ecology & Evolution
113 papers in training set
Top 4%
0.9%
23
Cell Stem Cell
57 papers in training set
Top 2%
0.8%
24
Science Advances
1098 papers in training set
Top 31%
0.7%
25
Nature Methods
336 papers in training set
Top 6%
0.7%
26
Cancer Cell
38 papers in training set
Top 2%
0.7%
27
Advanced Science
249 papers in training set
Top 20%
0.7%
28
Genome Biology
555 papers in training set
Top 8%
0.7%
29
Blood
67 papers in training set
Top 2%
0.6%
30
Cell
370 papers in training set
Top 19%
0.6%