STAPLER: Efficient learning of TCR-peptide specificity prediction from full-length TCR-peptide data
Kwee, B. P. Y.; Messemaker, M.; Marcus, E.; Oliveira, G.; Scheper, W.; Wu, C.; Teuwen, J.; Schumacher, T.
Show abstract
The prediction of peptide-MHC (pMHC) recognition by {beta} T-cell receptors (TCRs) remains a major biomedical challenge. Here, we develop STAPLER (Shared TCR And Peptide Language bidirectional Encoder Representations from transformers), a transformer language model that uses a joint TCR{beta}- peptide input to allow the learning of patterns within and between TCR{beta} and peptide sequences that encode recognition. First, we demonstrate how data leakage during negative data generation can confound performance estimates of neural network-based models in predicting TCR - pMHC specificity. We then demonstrate that, because of its pre-training and fine-tuning masked language modeling tasks, STAPLER outperforms both neural network-based and distance-based ML models in predicting the recognition of known antigens in an independent dataset, in particular for antigens for which little related data is available. Based on this ability to efficiently learn from limited labeled TCR- peptide data, STAPLER is well-suited to utilize growing TCR - pMHC datasets to achieve accurate prediction of TCR - pMHC specificity.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.