PCPpred: Prediction of Chemically Modified Peptide Permeability Across Multiple Assays for Oral Delivery
Shendre, A.; Gahlot, P. S.; Raghava, G. P. S.
Show abstract
Chemically modified peptides, including cyclic peptides, have emerged as promising candidates for oral delivery yet they face the challenge of low membrane permeability. In this study, the datasets were sourced from CycPeptMPDB, a database for membrane permeability of peptides obtained from different assays. Our quantitative analysis showed a clear discordance between permeability measured using PAMPA and cell-based assays (Caco-2, MDCK, and, RRCK), thereby explaining its limits as surrogate for cell-based assays. Therefore, we developed assay-specific predictive models to more accurately capture permeability determinants in each system. We systematically compute diverse features of modified peptides using open-source software and used fine-tuned peptide embeddings generated using pretrained chemical language models. Baseline models were developed using the generated multi-hierarchical molecular features. We also developed a stacked ensemble architecture, which utilizes multi-hierarchical features in models as base learners. The ensemble model achieved the best PAMPA test set performance with an MSE of 0.200, R2 of 0.685, and PCC of 0.830; and a R2 of 0.783 on Caco-2 test set. Model trained on 2D Mordred descriptors attained the highest performance on the Caco-2 test-set with MSE of 0.129, R2 of 0.793, and PCC of 0.892, surpassing state-of-the-art approaches such as CPMP. To support widespread adoption, we developed an open-access web-server (https://webs.iiitd.edu.in/raghava/pcppred/) for users to design modified peptides using human comprehensible MAP (Modifications and Annotations of Proteins) format, converting MAP to SMILES format, and predict permeability across assays with result visualization. To ensure widespread adoption, and reproducibility, we also provided a standalone on GitHub (https://github.com/raghavagps/pcppred).
Matching journals
The top 12 journals account for 50% of the predicted probability mass.