Assessment of Generative De Novo Peptide Design Methods for G Protein-Coupled Receptors
Junker, H.; Schoeder, C. T.
Show abstract
G protein-coupled receptors (GPCRs) play an ubiquitous role in the transduction of extracellular stimuli into intracellular responses and therefore represent a major target for the development of novel peptide-based therapeutics. In fact, approximately 30% of all non-sensory GPCRs are peptide-targeted, representing a blueprint for the design of de novo peptides, both as pharmacological tools and therapeutics. The recent advances of deep learning-based protein structure generation and structure prediction offer a multitude of peptide design stategies for GPCRs, yet confidence metrics rarely correlate with experimental success. In the context of peptides, this problem is exacerbated due to the lack of elaborate tertiary structures in peptides, raising the question of whether this is due to inadequate sampling or insufficient scoring. In this two-part benchmark, we addressed this question by first simulating the validation process of 124 unique known GPCR-peptide complexes using AlphaFold2 Initial Guess, Boltz-2 and RosettaFold3. We then assessed the peptide sampling capabilities of the respective generative methods BindCraft, BoltzGen and RFdiffusion3. Our results indicate that current design pipelines primarily suffer from significant confidence overestimation for misplaced peptides in the validation phase across all three prediction methods. We further highlight occurrences of significant memorization in both prediction as well as generation of peptides. While all generative methods sample backbone space sufficiently, their simultaneous sequence generation remains subpar and can be partially recovered through the use of ProteinMPNN. Taken together, our benchmark offers guidance for the design of peptides specifically using deep learning-based pipelines. Autor summaryDeep learning-based protein design is revolutionizing computational biology and development of such tools is progressing rapidly with increasing attention from both academic and non-academic institutions. Their applicability and performance is often assessed from an all-purpose objective, with implicit bias towards larger protein-protein interactions. Due to their size, peptides therefore present an edge case where performance is known to decrease compared to larger, more structured proteins. Here, we present a benchmark specifically for the deep learning-based design of peptides targeting G protein-coupled receptors (GPCRs), a major therapeutic drug target family, assessing the generation of novel GPCR-targeting peptides and the validation of these designs separately. Our results show that generative methods sample potential peptide placements and orientations sufficiently but validation fails to differentiate valid from invalid designs, indicating that the so-called scoring problem remains unsolved. Although focusing on a specific use-case, our results are generalizable to the broader field of protein design. Consequently, it can offer guidance for peptide-specific design applications and can contribute to the development and improvement of new methods.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.