PepCABO: Latent-space Bayesian optimization for peptide-MHC binding using contrastive alignment
Ghane, M.; Korpela, D.; Dumitrescu, A.; Lähdesmäki, H.
Show abstract
MotivationOptimizing peptide sequences for binding to specific MHC class I alleles is a central challenge in immunotherapy and vaccine design. The combinatorial size of peptide space, the nonlinear nature of peptide- MHC interactions, and limited experimental budgets make efficient optimization difficult. Latent-space Bayesian optimization (LSBO) provides a framework by embedding discrete sequences into a continuous space where Bayesian optimization can be applied. However, existing LSBO methods do not effectively leverage binding data from related alleles and often rely on inefficient random initialization. ResultsWe propose PepCABO, an LSBO framework for peptide-MHC binding using contrastive alignment, which utilizes a dual variational autoencoder framework that jointly learns peptide-allele alignment and a Gaussian process surrogate prior to Bayesian optimization. This simultaneous training induces a latent geometry that reflects the binding landscape and enables structured knowledge transfer across alleles. The pretrained model shapes a structured latent space in which peptides with high objective values regarding a specific MHC allele are geometrically organized, while the jointly trained Gaussian process defines an informative prior over the objective in this space, enabling principled and efficient exploration of promising regions during subsequent optimization. Across 12 target alleles without prior binding data and under both low- and high-budget settings, PepCABO consistently outperforms various baselines. We observe faster convergence, improved area under the optimization curve, and stronger best-found binding affinities, suggesting improved sample efficiency under experimentally constrained scenarios. Code availabilityThe source code is available at https://github.com/mohsen-g/PepCABO
Matching journals
The top 4 journals account for 50% of the predicted probability mass.