Back

High-PepBinder: A pLM-Guided Latent Diffusion Framework for Affinity-Aware Target-Specific Peptide Design

Qingyi, M.; Zhai, S.; Cao, S.; Zhu, R.; Xu, W.; Zhang, C.; Zhu, N.; Guo, J.; Duan, H.

2026-01-19 bioinformatics
10.64898/2026.01.12.698988 bioRxiv
Show abstract

Peptides, as therapeutic molecules, offer unique advantages in targeting complex protein surfaces, yet their rational design remains limited by the vastness of the sequence space and the constraints of traditional approaches. Here, we propose High-PepBinder, a sequence-only conditional diffusion framework for target-specific peptide generation. Guided by the target protein sequence, High-PepBinder adopts a dual encoder architecture that integrates protein language models (pLMs) with the diffusion model. This approach cascades the peptide generation model with an affinity classifier and enables the generation process to capture affinity-related features of the peptides through lightweight joint optimization. Due to the scarcity of protein-peptide affinity data, we constructed PepPBA, to our knowledge the most comprehensive dataset to date, and established a structure- and physics-based screening pipeline to prioritize top candidates. Results show that High-PepBinder demonstrates competitive performance across multiple peptide generation and affinity-related tasks. For representative targets, including KEAP1, XIAP, and EGFR, the generated peptides preserve key binding geometries and interface patterns of reference peptides in predicted complexes, while maintaining sequence diversity and favorable predicted properties. Overall, High-PepBinder contributes toward a general and sequence-only strategy for peptide design, offering a computational framework for expanding peptide discovery against challenging targets.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 0.4%
18.0%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.3%
17.2%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.7%
6.7%
4
Nature Communications
4913 papers in training set
Top 29%
6.3%
5
Bioinformatics
1061 papers in training set
Top 5%
4.2%
50% of probability mass above
6
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.1%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.0%
8
Communications Chemistry
39 papers in training set
Top 0.1%
2.6%
9
Cell Systems
167 papers in training set
Top 5%
2.3%
10
PLOS Computational Biology
1633 papers in training set
Top 14%
2.0%
11
Nucleic Acids Research
1128 papers in training set
Top 9%
2.0%
12
Journal of Cheminformatics
25 papers in training set
Top 0.3%
1.9%
13
Chemical Science
71 papers in training set
Top 1.0%
1.7%
14
Frontiers in Immunology
586 papers in training set
Top 5%
1.5%
15
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.6%
1.5%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
17
Patterns
70 papers in training set
Top 2%
1.1%
18
National Science Review
22 papers in training set
Top 2%
0.9%
19
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
20
Communications Biology
886 papers in training set
Top 22%
0.8%
21
iScience
1063 papers in training set
Top 33%
0.7%
22
Nature Biotechnology
147 papers in training set
Top 7%
0.7%
23
Nature Methods
336 papers in training set
Top 6%
0.7%
24
eLife
5422 papers in training set
Top 58%
0.7%
25
The Journal of Physical Chemistry Letters
58 papers in training set
Top 2%
0.7%
26
Journal of Medicinal Chemistry
68 papers in training set
Top 1%
0.7%
27
ACS Synthetic Biology
256 papers in training set
Top 4%
0.6%