Back

Assessment of Generative De Novo Peptide Design Methods for G Protein-Coupled Receptors

Junker, H.; Schoeder, C. T.

2026-03-02 bioinformatics
10.64898/2026.02.26.708415 bioRxiv
Show abstract

G protein-coupled receptors (GPCRs) play an ubiquitous role in the transduction of extracellular stimuli into intracellular responses and therefore represent a major target for the development of novel peptide-based therapeutics. In fact, approximately 30% of all non-sensory GPCRs are peptide-targeted, representing a blueprint for the design of de novo peptides, both as pharmacological tools and therapeutics. The recent advances of deep learning-based protein structure generation and structure prediction offer a multitude of peptide design stategies for GPCRs, yet confidence metrics rarely correlate with experimental success. In the context of peptides, this problem is exacerbated due to the lack of elaborate tertiary structures in peptides, raising the question of whether this is due to inadequate sampling or insufficient scoring. In this two-part benchmark, we addressed this question by first simulating the validation process of 124 unique known GPCR-peptide complexes using AlphaFold2 Initial Guess, Boltz-2 and RosettaFold3. We then assessed the peptide sampling capabilities of the respective generative methods BindCraft, BoltzGen and RFdiffusion3. Our results indicate that current design pipelines primarily suffer from significant confidence overestimation for misplaced peptides in the validation phase across all three prediction methods. We further highlight occurrences of significant memorization in both prediction as well as generation of peptides. While all generative methods sample backbone space sufficiently, their simultaneous sequence generation remains subpar and can be partially recovered through the use of ProteinMPNN. Taken together, our benchmark offers guidance for the design of peptides specifically using deep learning-based pipelines. Autor summaryDeep learning-based protein design is revolutionizing computational biology and development of such tools is progressing rapidly with increasing attention from both academic and non-academic institutions. Their applicability and performance is often assessed from an all-purpose objective, with implicit bias towards larger protein-protein interactions. Due to their size, peptides therefore present an edge case where performance is known to decrease compared to larger, more structured proteins. Here, we present a benchmark specifically for the deep learning-based design of peptides targeting G protein-coupled receptors (GPCRs), a major therapeutic drug target family, assessing the generation of novel GPCR-targeting peptides and the validation of these designs separately. Our results show that generative methods sample potential peptide placements and orientations sufficiently but validation fails to differentiate valid from invalid designs, indicating that the so-called scoring problem remains unsolved. Although focusing on a specific use-case, our results are generalizable to the broader field of protein design. Consequently, it can offer guidance for peptide-specific design applications and can contribute to the development and improvement of new methods.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
37.9%
2
Journal of Cheminformatics
25 papers in training set
Top 0.1%
9.2%
3
PLOS Computational Biology
1633 papers in training set
Top 4%
8.2%
50% of probability mass above
4
Bioinformatics
1061 papers in training set
Top 4%
6.4%
5
Bioinformatics Advances
184 papers in training set
Top 0.7%
4.9%
6
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.3%
3.7%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
2.7%
8
BMC Bioinformatics
383 papers in training set
Top 3%
2.6%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.5%
10
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.5%
1.7%
11
Scientific Reports
3102 papers in training set
Top 62%
1.5%
12
PLOS ONE
4510 papers in training set
Top 57%
1.5%
13
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.3%
14
Chemical Science
71 papers in training set
Top 2%
0.9%
15
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
16
Molecules
37 papers in training set
Top 2%
0.9%
17
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.8%
18
Journal of Computational Chemistry
11 papers in training set
Top 0.3%
0.6%
19
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%
20
Frontiers in Bioinformatics
45 papers in training set
Top 2%
0.5%