Back

PCPpred: Prediction of Chemically Modified Peptide Permeability Across Multiple Assays for Oral Delivery

Shendre, A.; Gahlot, P. S.; Raghava, G. P. S.

2026-01-22 bioinformatics
10.64898/2026.01.19.700485 bioRxiv
Show abstract

Chemically modified peptides, including cyclic peptides, have emerged as promising candidates for oral delivery yet they face the challenge of low membrane permeability. In this study, the datasets were sourced from CycPeptMPDB, a database for membrane permeability of peptides obtained from different assays. Our quantitative analysis showed a clear discordance between permeability measured using PAMPA and cell-based assays (Caco-2, MDCK, and, RRCK), thereby explaining its limits as surrogate for cell-based assays. Therefore, we developed assay-specific predictive models to more accurately capture permeability determinants in each system. We systematically compute diverse features of modified peptides using open-source software and used fine-tuned peptide embeddings generated using pretrained chemical language models. Baseline models were developed using the generated multi-hierarchical molecular features. We also developed a stacked ensemble architecture, which utilizes multi-hierarchical features in models as base learners. The ensemble model achieved the best PAMPA test set performance with an MSE of 0.200, R2 of 0.685, and PCC of 0.830; and a R2 of 0.783 on Caco-2 test set. Model trained on 2D Mordred descriptors attained the highest performance on the Caco-2 test-set with MSE of 0.129, R2 of 0.793, and PCC of 0.892, surpassing state-of-the-art approaches such as CPMP. To support widespread adoption, we developed an open-access web-server (https://webs.iiitd.edu.in/raghava/pcppred/) for users to design modified peptides using human comprehensible MAP (Modifications and Annotations of Proteins) format, converting MAP to SMILES format, and predict permeability across assays with result visualization. To ensure widespread adoption, and reproducibility, we also provided a standalone on GitHub (https://github.com/raghavagps/pcppred).

Matching journals

The top 12 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.8%
6.9%
2
Pharmaceutics
21 papers in training set
Top 0.1%
6.5%
3
Advanced Science
249 papers in training set
Top 2%
6.5%
4
Communications Chemistry
39 papers in training set
Top 0.1%
4.4%
5
Advanced Therapeutics
15 papers in training set
Top 0.1%
3.6%
6
Nature Communications
4913 papers in training set
Top 39%
3.6%
7
Biomaterials
78 papers in training set
Top 0.2%
3.6%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.1%
10
International Journal of Molecular Sciences
453 papers in training set
Top 3%
2.9%
11
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
12
Journal of Medicinal Chemistry
68 papers in training set
Top 0.4%
2.6%
50% of probability mass above
13
Scientific Reports
3102 papers in training set
Top 50%
2.1%
14
PLOS ONE
4510 papers in training set
Top 48%
2.1%
15
Frontiers in Pharmacology
100 papers in training set
Top 2%
1.7%
16
ACS Chemical Biology
150 papers in training set
Top 1.0%
1.7%
17
Pharmaceuticals
33 papers in training set
Top 0.6%
1.7%
18
ACS Omega
90 papers in training set
Top 2%
1.7%
19
Molecules
37 papers in training set
Top 1.0%
1.5%
20
Small
70 papers in training set
Top 0.6%
1.3%
21
Journal of Controlled Release
39 papers in training set
Top 0.6%
1.3%
22
Chemical Science
71 papers in training set
Top 1%
1.2%
23
eLife
5422 papers in training set
Top 49%
1.2%
24
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
25
Molecular Therapy Nucleic Acids
32 papers in training set
Top 0.6%
1.0%
26
International Journal of Biological Macromolecules
65 papers in training set
Top 3%
1.0%
27
iScience
1063 papers in training set
Top 26%
0.9%
28
eBioMedicine
130 papers in training set
Top 3%
0.8%
29
Journal of the American Society for Mass Spectrometry
33 papers in training set
Top 0.5%
0.8%
30
Frontiers in Microbiology
375 papers in training set
Top 9%
0.8%