Back

PepCABO: Latent-space Bayesian optimization for peptide-MHC binding using contrastive alignment

Ghane, M.; Korpela, D.; Dumitrescu, A.; Lähdesmäki, H.

2026-03-16 bioinformatics
10.64898/2026.03.13.711540 bioRxiv
Show abstract

MotivationOptimizing peptide sequences for binding to specific MHC class I alleles is a central challenge in immunotherapy and vaccine design. The combinatorial size of peptide space, the nonlinear nature of peptide- MHC interactions, and limited experimental budgets make efficient optimization difficult. Latent-space Bayesian optimization (LSBO) provides a framework by embedding discrete sequences into a continuous space where Bayesian optimization can be applied. However, existing LSBO methods do not effectively leverage binding data from related alleles and often rely on inefficient random initialization. ResultsWe propose PepCABO, an LSBO framework for peptide-MHC binding using contrastive alignment, which utilizes a dual variational autoencoder framework that jointly learns peptide-allele alignment and a Gaussian process surrogate prior to Bayesian optimization. This simultaneous training induces a latent geometry that reflects the binding landscape and enables structured knowledge transfer across alleles. The pretrained model shapes a structured latent space in which peptides with high objective values regarding a specific MHC allele are geometrically organized, while the jointly trained Gaussian process defines an informative prior over the objective in this space, enabling principled and efficient exploration of promising regions during subsequent optimization. Across 12 target alleles without prior binding data and under both low- and high-budget settings, PepCABO consistently outperforms various baselines. We observe faster convergence, improved area under the optimization curve, and stronger best-found binding affinities, suggesting improved sample efficiency under experimentally constrained scenarios. Code availabilityThe source code is available at https://github.com/mohsen-g/PepCABO

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.5%
34.9%
2
Nature Machine Intelligence
61 papers in training set
Top 0.3%
6.9%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
6.4%
4
Cell Systems
167 papers in training set
Top 2%
4.9%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 39%
3.7%
6
ImmunoInformatics
11 papers in training set
Top 0.1%
3.7%
7
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.6%
8
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
2.1%
9
Communications Biology
886 papers in training set
Top 5%
2.1%
10
Nature Methods
336 papers in training set
Top 4%
1.9%
11
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
12
mAbs
28 papers in training set
Top 0.1%
1.8%
13
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
14
Frontiers in Immunology
586 papers in training set
Top 4%
1.7%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
16
Science Advances
1098 papers in training set
Top 21%
1.4%
17
Scientific Reports
3102 papers in training set
Top 63%
1.4%
18
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
19
iScience
1063 papers in training set
Top 29%
0.8%
20
Journal of Cheminformatics
25 papers in training set
Top 0.6%
0.7%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
22
Patterns
70 papers in training set
Top 3%
0.7%
23
eLife
5422 papers in training set
Top 61%
0.7%
24
PLOS ONE
4510 papers in training set
Top 71%
0.7%
25
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.7%
26
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
27
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
28
Advanced Science
249 papers in training set
Top 21%
0.7%
29
Expert Systems with Applications
11 papers in training set
Top 0.7%
0.5%
30
Nature Biotechnology
147 papers in training set
Top 9%
0.5%