Back

epiTCR-KDA: Knowledge Distillation model on Dihedral Angles for TCR-peptide prediction

Pham, M.-D. N.; Su, C. T.-T.; Nguyen, T.-N.; Nguyen, H.-N.; Nguyen, D. D. A.; Giang, H.; Nguyen, D.-T.; Phan, M.-D.; Nguyen, V.

2024-05-21 bioinformatics
10.1101/2024.05.18.594806 bioRxiv
Show abstract

MotivationAntigen recognition by T-cell receptors (TCRs) triggers cascades of immune responses. Successful predictions of the TCR and antigen (as peptide) bindings therefore signify the advancements in immunotherapy. However, most of current TCR-peptide interaction predictors fail to predict unseen data. This limitation may be derived from the conventional usage of TCR and/or peptide sequences as input, which may not adequately reflect their structural characteristics. Therefore, incorporating the TCR and peptide structural information into the prediction model to improve the generalizability is necessary. ResultsWe presented epiTCR-KDA as a new predictor of TCR-peptide binding that utilises structural information, specifically the dihedral angles between the residues of both the peptide and the TCR. This structural descriptor was integrated into a model constructed using knowledge distillation to enhance its generalizability. The epiTCR-KDA demonstrated competitive prediction performance, with an AUC of 0.99 for seen data and AUC of 0.86 for unseen data. Across multiple public datasets, epiTCR-KDA consistently outperformed other predictors, such as epiTCR, NetTCR, BERTrand, TEIM-Seq, TEINet, and ImRex, maintaining a median AUC of 0.9 (ranging from 0.82 to 0.91). Further analysis of epiTCR-KDA performance indicated that the cosine similarity of the dihedral angle vectors between the unseen testing data and training data is crucial for its stable performance. In conclusion, our epiTCR-KDA model, with its capacity to predict for unseen data, has brought us one step closer toward the development of a highly effective pipeline for affordable antigen-based immunotherapy. Availability and implementationepiTCR-KDA is available on GitHub (https://github.com/ddiem-ri-4D/epiTCR-KDA)

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
ImmunoInformatics
11 papers in training set
Top 0.1%
28.7%
2
Bioinformatics
1061 papers in training set
Top 2%
13.0%
3
Computers in Biology and Medicine
120 papers in training set
Top 0.3%
6.5%
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
50% of probability mass above
5
Frontiers in Immunology
586 papers in training set
Top 2%
3.7%
6
Scientific Reports
3102 papers in training set
Top 33%
3.7%
7
BMC Bioinformatics
383 papers in training set
Top 3%
3.7%
8
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.5%
9
PLOS ONE
4510 papers in training set
Top 47%
2.2%
10
GigaScience
172 papers in training set
Top 0.9%
2.2%
11
Nucleic Acids Research
1128 papers in training set
Top 8%
2.2%
12
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
13
iScience
1063 papers in training set
Top 13%
1.8%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
15
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.4%
16
Frontiers in Physiology
93 papers in training set
Top 4%
1.0%
17
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
18
International Journal of Molecular Sciences
453 papers in training set
Top 12%
0.9%
19
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
20
Patterns
70 papers in training set
Top 2%
0.8%
21
Journal of Translational Medicine
46 papers in training set
Top 2%
0.8%
22
Frontiers in Bioinformatics
45 papers in training set
Top 0.8%
0.8%
23
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.8%
24
BMC Medical Genomics
36 papers in training set
Top 1%
0.8%
25
Biomedicines
66 papers in training set
Top 4%
0.7%
26
Immunology
29 papers in training set
Top 1%
0.5%
27
BioMed Research International
25 papers in training set
Top 4%
0.5%
28
PeerJ
261 papers in training set
Top 18%
0.5%
29
Informatics in Medicine Unlocked
21 papers in training set
Top 2%
0.5%