Back

Antimicrobial peptide databases and prediction tools: Toward a standard evaluation framework

Cisterna Garcia, A.; Gonzalez Lopez, A. M.; Vozi, A.; Esteban, M. A.; Egli, A.; Jutzeler, C.; Palma, J.; Sanchez-Ferrer, A.; Botia, J. A.

2026-05-21 bioinformatics
10.64898/2026.05.19.726290 bioRxiv
Show abstract

Antimicrobial resistance (AMR) has a profound impact on animal and human health and is associated with substantial morbidity, mortality and public health costs. There is a clear need to develop novel, effective antibiotic agents, which can overcome the current AMR crisis. Antimicrobial peptides (AMPs) may offer such a solution and have attracted growing attention for their potential to combat AMR. In parallel, the growing availability of peptide sequences in public databases has stimulated the development of numerous machine learning and deep learning tools to predict antimicrobial activity computationally. However, it remains unclear how reliably these tools can be compared, as existing studies often rely on heterogeneous datasets and inconsistent evaluation protocols that may lead to data leakage and inflated performance estimates. This raises a central question: what evaluation criteria and benchmark resources are needed to enable fair, reproducible, and biologically meaningful assessment of AMP prediction tools? We address this question by focusing specifically on antibacterial peptides (ABPs). We first provide an overview of AMP databases relevant to antibacterial activity and compare their content, redundancy, and experimental metadata. We then critically assess existing computational tools for ABP prediction, highlighting key limitations related to dataset construction, affinity to certain sequences, data leakage, and inconsistent performance reporting. Based on these limitations, we propose a reference evaluation framework designed to improve comparability, reproducibility, and practical utility in ABP prediction. Finally, we provide targeted recommendations for AMP databases and future tool development to support more robust progress in the computational discovery of ABPs.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
34.1%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.3%
10.4%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
7.0%
50% of probability mass above
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.2%
5
Journal of Cheminformatics
25 papers in training set
Top 0.2%
2.8%
6
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.2%
2.8%
7
Bioinformatics Advances
184 papers in training set
Top 2%
2.7%
8
Scientific Reports
3102 papers in training set
Top 52%
1.9%
9
PLOS ONE
4510 papers in training set
Top 49%
1.9%
10
International Journal of Molecular Sciences
453 papers in training set
Top 6%
1.8%
11
Bioinformatics
1061 papers in training set
Top 7%
1.7%
12
Molecules
37 papers in training set
Top 0.9%
1.5%
13
ACS Pharmacology & Translational Science
40 papers in training set
Top 0.5%
1.4%
14
Pharmaceuticals
33 papers in training set
Top 0.9%
1.3%
15
Nucleic Acids Research
1128 papers in training set
Top 14%
1.1%
16
Frontiers in Immunology
586 papers in training set
Top 6%
0.9%
17
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
18
Chemical Science
71 papers in training set
Top 2%
0.9%
19
Advanced Science
249 papers in training set
Top 17%
0.8%
20
Frontiers in Pharmacology
100 papers in training set
Top 4%
0.8%
21
Journal of Medicinal Chemistry
68 papers in training set
Top 1%
0.8%
22
eLife
5422 papers in training set
Top 57%
0.8%
23
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%
24
Journal of the American Society for Mass Spectrometry
33 papers in training set
Top 0.5%
0.7%
25
Antibiotics
32 papers in training set
Top 1%
0.7%
26
Protein Science
221 papers in training set
Top 2%
0.7%
27
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.7%
28
Frontiers in Microbiology
375 papers in training set
Top 10%
0.5%
29
ACS Omega
90 papers in training set
Top 5%
0.5%
30
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.5%