Back

Benchmarking generative scaffold design methods for peptide engineering in TCR-MHC complexes

Xie, L.; Dam, G.-B.; Patel, Y.; Denzler, L.; Shao, Y.; Wang, R.; Caron, E.; Yasumizu, Y.; Hafler, D. A.; Rodriguez Martinez, M.

2026-01-23 bioinformatics
10.64898/2026.01.22.701133 bioRxiv
Show abstract

De novo peptide design at T cell receptor-peptide-major histocompatibility complex (TCR-pMHC) interfaces is a central challenge in computational immunology, with direct implications for vaccine development, cancer immunotherapy, and autoimmune disease. Despite rapid advances in generative protein modeling, there is currently no systematic benchmark evaluating these methods in the highly constrained and immunologically relevant setting of peptide-MHC presentation and TCR recognition. Here, we present two complementary contributions. First, we introduce a multi-stage computational pipeline for peptide design in predefined TCR-pMHC contexts, integrating generative modeling with sequence optimization and structure-based filtering. Second, we establish a benchmark for evaluating generative peptide design methods in TCR-pMHC complexes. Using a curated dataset of high-quality crystal structures deposited after the AlphaFold3 training cutoff, we assess state-of-the-art generative approaches for peptide backbone generation, sequence design, and the enrichment of near-native solutions. We explicitly examine whether different backbone generation strategies respect the geometric constraints of the MHC binding groove and recover native-like peptide conformations. Our results reveal substantial method-dependent differences: some generative strategies fail systematically in the groove-bound peptide setting, whereas others generate physically plausible backbones with varying accuracy and conformational diversity. We further show that enforcing anchor constraints strongly influences peptide conformations at non-anchor positions, highlighting a trade-off between structural accuracy and conformational sampling. To enable fair and reproducible comparison, we introduce a standardized, multi-stage scoring protocol that integrates MHC binding prediction, physics-based energy evaluation, and independent structure prediction confidence metrics to enrich near-native designs from large candidate pools. Together, this work establishes the first comprehensive pipeline and benchmark for generative peptide design at TCR-pMHC interfaces and provides practical guidelines for developing peptide design workflows and evaluating generative models in immunologically constrained protein design settings.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.5%
14.8%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.5%
3
Frontiers in Immunology
586 papers in training set
Top 0.9%
7.2%
4
Bioinformatics
1061 papers in training set
Top 4%
6.3%
5
Journal of Chemical Information and Modeling
207 papers in training set
Top 1.0%
4.9%
6
Nucleic Acids Research
1128 papers in training set
Top 5%
3.9%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
50% of probability mass above
8
Nature Communications
4913 papers in training set
Top 39%
3.6%
9
Nature Machine Intelligence
61 papers in training set
Top 0.9%
3.6%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.1%
11
Structure
175 papers in training set
Top 1%
2.1%
12
ImmunoInformatics
11 papers in training set
Top 0.1%
2.1%
13
Cell Reports Methods
141 papers in training set
Top 2%
1.9%
14
eLife
5422 papers in training set
Top 38%
1.9%
15
Scientific Reports
3102 papers in training set
Top 58%
1.7%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 32%
1.7%
17
mAbs
28 papers in training set
Top 0.2%
1.7%
18
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
19
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.6%
1.5%
20
Science Advances
1098 papers in training set
Top 20%
1.5%
21
Nature Methods
336 papers in training set
Top 5%
1.5%
22
Cell Genomics
162 papers in training set
Top 4%
1.3%
23
Advanced Science
249 papers in training set
Top 15%
1.0%
24
Biophysical Journal
545 papers in training set
Top 4%
0.9%
25
Patterns
70 papers in training set
Top 2%
0.8%
26
Journal of Cheminformatics
25 papers in training set
Top 0.5%
0.8%
27
Nature Biotechnology
147 papers in training set
Top 7%
0.7%
28
Chemical Science
71 papers in training set
Top 2%
0.7%
29
ACS Synthetic Biology
256 papers in training set
Top 3%
0.7%
30
Nature Computational Science
50 papers in training set
Top 2%
0.6%