A Systematic Benchmark for Peptide Property Prediction
Dong, X.; Yang, K.; Wu, T.; Li, P.; Gao, L.
Show abstract
Accurate prediction of peptide physicochemical properties and biological activities is critical for rational peptide design and high-throughput screening. However, current research is often constrained by heterogeneous data sources and inconsistent evaluation standards, which hinder fair comparisons and reliable assessments of model generalization. In this work, we present PPB, a peptide property prediction benchmark designed to evaluate model performance with an emphasis on realistic generalization across both classification and regression tasks. By applying unified biological filtering criteria, we systematically curated and standardized 15 datasets comprising 161,571 unique sequences, spanning a wide range of physicochemical properties and functional activities. We benchmarked seven representative architectures--encompassing traditional machine learning, deep learning, and pre-trained language models--alongside diverse feature encoding schemes. Furthermore, we investigated the impact of random versus homology-based (sequence similarity) data splitting strategies on model robustness. To facilitate community access, we developed the PPB web server (http://ppb.molmatrix.com/index.html), which provides centralized resources for standardized dataset downloads, interactive visualization of benchmark results, and detailed evaluation protocols. Author summaryPeptides are short amino acid chains essential for biological functions and drug discovery. While AI models have accelerated peptide property prediction, the field lacks a unified standard to fairly compare these methods, often leading to inconsistent results and overoptimistic performance estimates. In this study, we introduce the Peptide Property Benchmark (PPB), a comprehensive framework featuring 15 standardized datasets and over 160,000 sequences. We systematically evaluated diverse AI paradigms, including traditional machine learning and advanced protein language models. Our results demonstrate that large-scale pre-trained models--the biological equivalent of large language models--offer superior accuracy and stability, particularly for small or complex datasets. Crucially, our analysis reveals a "clustering bottleneck": standard tools used to group proteins based on similarity often fail when applied to short peptides, causing data to fragment excessively. This suggests that traditional strategies for testing model generalization may be less effective for peptides than previously assumed. To facilitate community progress, we provide an online platform for standardized data and evaluation. This work establishes a rigorous foundation for developing more reliable AI tools for the next generation of peptide-based therapeutics.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.