Back

Unsupervised identification of low-frequency antigen-specific TCRs using distance-based anomaly scoring

Kinoshita, K.; Kobayashi, T. J.

2026-03-11 bioinformatics
10.64898/2026.03.09.709174 bioRxiv
Show abstract

Identifying antigen-specific T cell receptors (TCRs) within the diverse human repertoire remains challenging due to their extremely low frequencies, often as rare as one per million cells. Here, we propose a novel unsupervised approach that detects low-frequency antigen-specific TCRs through distance-based anomaly detection in TCR sequence space. Our method is based on the observation that antigen-specific TCRs preferentially localize at the periphery of V gene clusters rather than cluster centers. Using TCRdist3 to quantify sequence distances, we identify query TCRs that are anomalous compared to reference repertoires within their V-J gene combinations. We validated this approach across three immunological contexts: COVID-19 infection, influenza vaccination, and yellow fever vaccination. For SARS-CoV-2-specific TCR detection in a COVID-19 patient, our method demonstrated 34.3% accuracy, significantly outperforming similarity-based (ALICE: 8.0%) and frequency-based methods (edgeR: 5.8%, the Pogorelyy method: 6.3%), and uniquely detected low-frequency antigen-specific TCRs at clone count one. The minimal overlap with conventional approaches ([≤]6.7%) indicates our method captures distinct TCR clones overlooked by existing analyses. This spatial distribution-based paradigm provides a complementary strategy for TCR specificity detection, particularly valuable for identifying rare antigen-specific clones essential for understanding immune responses.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 3%
9.8%
2
Nucleic Acids Research
1128 papers in training set
Top 2%
9.8%
3
PLOS Computational Biology
1633 papers in training set
Top 3%
9.8%
4
Frontiers in Immunology
586 papers in training set
Top 2%
4.7%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.2%
6
iScience
1063 papers in training set
Top 3%
4.1%
7
ImmunoInformatics
11 papers in training set
Top 0.1%
3.9%
8
eLife
5422 papers in training set
Top 26%
3.6%
9
Nature Communications
4913 papers in training set
Top 41%
3.5%
50% of probability mass above
10
Cell Reports Methods
141 papers in training set
Top 1%
3.0%
11
Bioinformatics Advances
184 papers in training set
Top 2%
3.0%
12
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.0%
13
Genome Medicine
154 papers in training set
Top 3%
3.0%
14
Cell Systems
167 papers in training set
Top 5%
2.7%
15
Cell Genomics
162 papers in training set
Top 3%
2.0%
16
Scientific Reports
3102 papers in training set
Top 52%
2.0%
17
Patterns
70 papers in training set
Top 0.7%
1.8%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.7%
19
Genome Research
409 papers in training set
Top 2%
1.6%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.6%
21
BMC Bioinformatics
383 papers in training set
Top 5%
1.4%
22
Science Advances
1098 papers in training set
Top 21%
1.4%
23
Advanced Science
249 papers in training set
Top 13%
1.4%
24
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.3%
25
The Journal of Immunology
146 papers in training set
Top 1%
1.2%
26
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
27
Communications Biology
886 papers in training set
Top 20%
0.9%
28
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
29
PLOS ONE
4510 papers in training set
Top 67%
0.8%
30
GigaScience
172 papers in training set
Top 3%
0.7%