Back

Cyclic peptides space: The methodology of sequence selection to cover the comprehensive physical properties

Tsuchihashi, R.; Kinoshita, M.

2026-03-12 bioinformatics
10.64898/2026.03.10.710724 bioRxiv
Show abstract

Cyclic peptides have emerged as a pivotal modality for next-generation therapeutics, due to their superior biocompatibility, high selectivity, and structural stability. While AI-driven peptide design has advanced rapidly, conventional optimization algorithms are often constrained by initialization biases, which impede the efficient exploration of the vast chemical space. Here, we propose a novel methodology that integrates the protein language model ESM-2 with cyclic permutation averaging of embeddings to resolve this bottleneck. This approach establishes a comprehensive "peptide space", a high-dimensional vector representation that encapsulates the physicochemical and structural attributes of cyclic peptides. Our analysis reveals that random sequence selection results in a heterogeneous distribution within this space, potentially underrepresenting specific functional regions. Conversely, navigating this defined peptide space enables the selection of libraries that uniformly span diverse molecular properties. In a proof-of-concept study designing binders for {beta}2-microglobulin ({beta}2m), we demonstrate that initial sequences uniformly sampled from our peptide space yield superior candidates more efficiently than those derived from random selection. Furthermore, this framework facilitates the quantitative assessment of mutational perturbations on global peptide properties, supporting rational decision-making for both broad exploration and local optimization. This "peptide space" concept provides a foundational framework for defining appropriate search boundaries and enhancing computational efficiency in AI-mediated drug discovery. Graphic Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=172 SRC="FIGDIR/small/710724v1_ufig1.gif" ALT="Figure 1"> View larger version (48K): org.highwire.dtl.DTLVardef@1dd903eorg.highwire.dtl.DTLVardef@128f941org.highwire.dtl.DTLVardef@1041e13org.highwire.dtl.DTLVardef@1527b25_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
42.1%
2
Chemical Science
71 papers in training set
Top 0.1%
11.1%
50% of probability mass above
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.3%
7.3%
4
Journal of Medicinal Chemistry
68 papers in training set
Top 0.2%
5.2%
5
Advanced Science
249 papers in training set
Top 6%
3.1%
6
JACS Au
35 papers in training set
Top 0.2%
2.6%
7
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.5%
1.8%
8
The Journal of Physical Chemistry Letters
58 papers in training set
Top 0.9%
1.6%
9
The Journal of Physical Chemistry B
158 papers in training set
Top 1%
1.4%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.0%
11
Communications Chemistry
39 papers in training set
Top 0.6%
1.0%
12
International Journal of Molecular Sciences
453 papers in training set
Top 12%
1.0%
13
Molecules
37 papers in training set
Top 1%
1.0%
14
PLOS ONE
4510 papers in training set
Top 65%
0.8%
15
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
16
eLife
5422 papers in training set
Top 55%
0.8%
17
Advanced Therapeutics
15 papers in training set
Top 0.4%
0.8%
18
Pharmaceuticals
33 papers in training set
Top 2%
0.7%
19
ACS Omega
90 papers in training set
Top 5%
0.5%
20
iScience
1063 papers in training set
Top 39%
0.5%
21
Molecular Therapy Nucleic Acids
32 papers in training set
Top 1.0%
0.5%
22
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%
23
Chemical Communications
24 papers in training set
Top 1%
0.5%