Back

sxRaep: A Rapid and Accurate Enzyme Predictor for high-throughput mining of enzymatic sequences

Duan, H.; Han, X.; Mo, Y.; Ren, B.; Xia, L. C.

2026-05-11 bioinformatics
10.64898/2026.05.06.723393 bioRxiv
Show abstract

MotivationMetagenomic sequencing generates petabyte-scale sequence datasets that strain both deep learning and alignment based enzyme annotation tools. A lightweight rapid and accurate filter tool is needed to identify enzymatic sequences prior to resource-intensive functional prediction. ResultsWe present sxRaep (Rapid and Accurate Enzyme Predictor), a resource-efficient framework using lightweight physicochemical features for enzyme pre-screening. sxRaep achieves 6,604-fold speedup over Diamond (0.002 seconds per inference) with 62.1% memory reduction relative to Diamond (372 MB peak), while maintaining 99.4% accuracy and the highest recall in remote homology detection. This lightweight approach identifies enzymatic candidates missed by alignment-based methods without sacrificing accuracy. Availability and ImplementationsxRaep is available as a Python package at https://pypi.org/project/raep/, is maintained as an open-source software repository at https://github.com/labxscut/sxRaep, and can be deployed using the Docker image cirinmok/raep:python3.11 (https://hub.docker.com/r/cirinmok/raep/tags), which provides a reproducible Python 3.11 environment for enzyme prediction and model execution. Contactlcxia@scut.edu.cn

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.3%
49.7%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.4%
10.0%
50% of probability mass above
3
BMC Bioinformatics
383 papers in training set
Top 2%
6.3%
4
Genome Biology
555 papers in training set
Top 2%
3.6%
5
Nature Communications
4913 papers in training set
Top 44%
2.9%
6
Bioinformatics Advances
184 papers in training set
Top 2%
2.9%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.9%
8
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.7%
9
PLOS Computational Biology
1633 papers in training set
Top 17%
1.6%
10
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
11
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.2%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
13
GigaScience
172 papers in training set
Top 2%
0.9%
14
Genome Medicine
154 papers in training set
Top 7%
0.9%
15
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
16
Advanced Science
249 papers in training set
Top 18%
0.8%
17
ACS Synthetic Biology
256 papers in training set
Top 3%
0.8%
18
Journal of Molecular Biology
217 papers in training set
Top 4%
0.6%
19
Chemical Science
71 papers in training set
Top 2%
0.6%
20
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.7%
0.6%
21
Metabolic Engineering
68 papers in training set
Top 0.8%
0.6%