Back

A Global Discovery of Antimicrobial Peptides in Deep-Sea Microbiomes Driven by an ESM-2 and Transformer-based Dual-Engine Framework

Chen, B.; Mou, X.; Song, Z.; Lin, H.; Han, T.; Wang, R.; Ou, H.-Y.; Zhang, Y.; Li, J.

2026-03-16 bioinformatics
10.1101/2025.11.20.689422 bioRxiv
Show abstract

The global crisis of multidrug-resistant pathogens necessitates innovative antimicrobial peptide (AMP) discovery. Deep-sea microbiomes represent an underexplored resource for novel AMPs, but their mining is hindered by biases in current prediction methods, including sequence length imbalance, N-terminal methionine artifacts, and lack of microbial optimization. To overcome these, we developed XAMP, a dual-engine predictor integrating XAMP-E (based on ESM-2 for high-accuracy feature representation) and XAMP-T (a one-layer Transformer for accelerated screening). By training on debiased datasets, XAMP achieved a median AUC of 0.972, an approximately 10% improvement over state-of-the-art tools, with XAMP-T operating 5 to 40 times faster. Applying this pipeline to deep-sea metagenomes, we identified 2,355 promising AMP candidates. Experimental validation of six synthesized peptides against ESKAPE pathogens demonstrated potent, broad-spectrum activity, particularly against Gram-negative bacteria, which dominate deep-sea ecosystems and represent a major challenge in nosocomial infections. This study establishes a robust computational-experimental framework for discovering therapeutic candidates from extreme environments to combat antibiotic resistance crises.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 0.1%
32.6%
2
Nature Communications
4913 papers in training set
Top 21%
9.0%
3
Nucleic Acids Research
1128 papers in training set
Top 3%
6.3%
4
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.3%
50% of probability mass above
5
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
6
Cell Systems
167 papers in training set
Top 4%
3.5%
7
Microbiome
139 papers in training set
Top 1%
3.5%
8
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.9%
9
Cell Reports Medicine
140 papers in training set
Top 4%
1.7%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
11
Communications Biology
886 papers in training set
Top 9%
1.7%
12
Genome Medicine
154 papers in training set
Top 5%
1.7%
13
Nature Biotechnology
147 papers in training set
Top 5%
1.7%
14
eLife
5422 papers in training set
Top 43%
1.7%
15
iScience
1063 papers in training set
Top 18%
1.5%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.3%
17
Cell Reports Methods
141 papers in training set
Top 3%
1.3%
18
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
19
Cell Genomics
162 papers in training set
Top 5%
1.2%
20
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
21
Cell Reports Physical Science
18 papers in training set
Top 0.8%
0.7%
22
Scientific Reports
3102 papers in training set
Top 75%
0.7%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%
24
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.6%
25
Cell Reports
1338 papers in training set
Top 36%
0.6%