Back

Deciphering antigen-driven T cell responses through vectorized TCRdist sequence neighborhood quantification

Valkiers, S.; Mayer-Blackwell, K.; Yeh, A. C.; Van Deuren, V. M. L.; Fiore-Gartland, A.; Hill, G.; Laukens, K.; Meysman, P.; Bradley, P.

2026-04-14 immunology
10.64898/2026.04.10.717405 bioRxiv
Show abstract

T cells provide precise mechanisms to defend the body against infection and malignancies, mediated through the expression of their hypervariable T cell receptors (TCRs). Interpreting similarity between TCRs, however, remains a significant challenge. While performant clustering methods exist, these often fail to distinguish between antigen-driven convergent selection and patterns arising stochastically from biases in the V(D)J recombination mechanism. Moreover, defining enrichment in sequence similarity among large repertoires is computationally taxing. To address these limitations, we present an efficient computational framework for rapid approximation of TCRdist distances using fixed-length vector embeddings and highly optimized nearest neighbor search, allowing sequence similarity enrichment testing at a multi-repertoire-wide scale. This framework leverages a novel shuffling-based background model that preserves important repertoire characteristics such as V gene frequency, CDR3 sequence length and generation probability more accurately than synthetic models. Together, these tools enable the efficient and robust identification of significantly neighbor enriched (SNE) TCR sequences at scale. We validate this approach by showing a significant enrichment of SNE clones in memory T cell fractions and further demonstrate its utility in identifying convergent T cell signatures of response to vaccination and viral infections, providing a scalable approach for antigen-agnostic T cell response profiling.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.9%
11.9%
2
Nature Methods
336 papers in training set
Top 1%
9.7%
3
Nature Computational Science
50 papers in training set
Top 0.1%
6.9%
4
Bioinformatics
1061 papers in training set
Top 4%
6.1%
5
eLife
5422 papers in training set
Top 15%
6.1%
6
PLOS Computational Biology
1633 papers in training set
Top 6%
6.1%
7
Nature Biotechnology
147 papers in training set
Top 2%
6.1%
50% of probability mass above
8
Nucleic Acids Research
1128 papers in training set
Top 4%
4.7%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 15%
4.7%
10
Cell Reports
1338 papers in training set
Top 13%
3.8%
11
Frontiers in Immunology
586 papers in training set
Top 2%
3.8%
12
iScience
1063 papers in training set
Top 11%
2.0%
13
Nature Communications
4913 papers in training set
Top 49%
1.8%
14
Genome Medicine
154 papers in training set
Top 5%
1.4%
15
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
16
Cell
370 papers in training set
Top 14%
1.1%
17
Scientific Reports
3102 papers in training set
Top 72%
0.9%
18
PLOS ONE
4510 papers in training set
Top 67%
0.8%
19
Genome Biology
555 papers in training set
Top 7%
0.8%
20
The Journal of Immunology
146 papers in training set
Top 2%
0.7%
21
PLOS Pathogens
721 papers in training set
Top 9%
0.7%
22
Science Advances
1098 papers in training set
Top 31%
0.7%
23
Cell Reports Methods
141 papers in training set
Top 6%
0.7%
24
Patterns
70 papers in training set
Top 3%
0.7%
25
Science
429 papers in training set
Top 21%
0.7%
26
Immunity
58 papers in training set
Top 5%
0.7%
27
Nature Immunology
71 papers in training set
Top 2%
0.7%