Back

Modeling TCR-pMHC Binding with Dual Encoders and Cross-Attention Fusion

Wang, W.; Qi, C.; Wei, Z.

2025-12-02 bioinformatics
10.64898/2025.12.01.691424 bioRxiv
Show abstract

Accurately modeling the binding between T-cell receptors (TCRs) and peptide-MHC (pMHC) complexes is essential for guiding immunotherapy development and personalized vaccine design. However, the vast diversity of TCR repertoires and the scarcity of experimentally validated interactions make generalization to unseen epitopes challenging. This paper proposes TIDE, a cross-attention-driven dual-encoder framework that leverages large protein and molecular language models to learn discriminative representations of TCRs and peptides. In TIDE, TCR sequences are encoded using Evolutionary Scale Modeling (ESM), while peptides are transformed into SMILES strings and processed by MolFormer to capture chemical and structural properties. Multi-layer cross-attention then refines and integrates these embeddings, highlighting interaction-relevant patterns without requiring explicit structural alignment. Evaluated on the TCHard benchmark under both zero-shot and few-shot settings, TIDE achieves superior predictive accuracy and robustness compared to state-of-the-art baselines such as ChemBERTa, TITAN, and NetTCR. These results demonstrate that combining pretrained language models with cross-attention fusion offers a powerful approach for TCR-pMHC binding prediction and paves the way for more reliable computational immunology applications.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
22.8%
2
Frontiers in Immunology
586 papers in training set
Top 0.7%
8.5%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
6.5%
4
Briefings in Bioinformatics
326 papers in training set
Top 0.8%
6.4%
5
Advanced Science
249 papers in training set
Top 3%
4.9%
6
Bioinformatics
1061 papers in training set
Top 5%
4.0%
50% of probability mass above
7
Nature Communications
4913 papers in training set
Top 39%
3.6%
8
ImmunoInformatics
11 papers in training set
Top 0.1%
3.3%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.1%
10
Science Advances
1098 papers in training set
Top 8%
3.1%
11
Nucleic Acids Research
1128 papers in training set
Top 7%
2.6%
12
Cell Reports Medicine
140 papers in training set
Top 2%
2.4%
13
Cell Systems
167 papers in training set
Top 5%
2.1%
14
iScience
1063 papers in training set
Top 11%
1.9%
15
Genome Medicine
154 papers in training set
Top 4%
1.7%
16
Communications Biology
886 papers in training set
Top 9%
1.7%
17
Cell Genomics
162 papers in training set
Top 3%
1.7%
18
Patterns
70 papers in training set
Top 1%
1.2%
19
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
1.0%
20
mAbs
28 papers in training set
Top 0.3%
0.8%
21
Scientific Reports
3102 papers in training set
Top 74%
0.8%
22
Bioinformatics Advances
184 papers in training set
Top 5%
0.8%
23
Cell Reports
1338 papers in training set
Top 36%
0.5%
24
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%
25
PLOS ONE
4510 papers in training set
Top 73%
0.5%
26
Expert Systems with Applications
11 papers in training set
Top 0.7%
0.5%
27
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 8%
0.5%
28
eLife
5422 papers in training set
Top 63%
0.5%
29
Nature Computational Science
50 papers in training set
Top 2%
0.5%