PMGen: From Peptide-MHC Structure Prediction to Peptide Generation
Asgary, A. H.; Aleyasin, A.; Mehl, J. A.; Fallah, S.; Aintablian, H.; Ludewig, B.; Mishto, M.; Liepe, J.; Soeding, J.
Show abstract
Accurate structural modeling of peptide-MHC (pMHC) complexes is a prerequisite for understanding adaptive immunity and developing data-driven immunotherapies. However, current tools are often limited by narrow class coverage, restricted peptide lengths, or insufficient accuracy for downstream design tasks. Here, we introduce PMGen (Peptide MHC Generator), an integrated framework for structure prediction and structure-guided design of variable-length peptides across MHC class I and II. By introducing Initial Guess and Template Engineering as strategies to enforce anchor constraints in AlphaFold2, PMGen achieves state-of-the-art structural fidelity with median peptide core RMSDs of 0.54 [A] for MHC-I and 0.33 [A] for MHC-II, outperforming five state-of-the-art methods. We further demonstrate that PMGen captures the subtle structural impact of single-point neoantigen mutations and that model confidence (pLDDT) reliably correlates with structural accuracy. We investigated two potential applications of our framework: structure-aware peptide design and generating data for machine learning (ML) models. To this end, we introduced a framework to sample peptides with preserved structures and improved binding affinity. As an example for ML application, we fine-tuned ProteinMPNN on PMGen-modeled structures. This improved sequence recovery from 0.19 to 0.40 compared to the baseline. Ultimately, PMGen bridges the gap between high-fidelity structural prediction and downstream sequence design, offering a scalable solution to generate the large-scale, high-quality structural datasets required to train advanced predictive models in immunology. Available at https://github.com/soedinglab/PMGen.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.