Back

Enzyme Classification via Semi-Supervised Functional ResidueLearning

Gong, C.; Zhang, D.; Ouyang-Zhang, J.; Liu, Q.; Klivans, A.; Diaz, D.

2026-02-14 bioengineering
10.64898/2026.02.11.705200 bioRxiv
Show abstract

Predicting enzymatic function from a protein sequence is a fundamental task in protein discovery and engineering. In this paper, we present Semi-supervised Learning for Enzyme Classification (SLEEC): a semi-supervised learning framework that learns a function-aware protein representation for Enzyme Commision (EC) number prediction. SLEEC achieves SOTA performance on standard bench-marks and provides interpretable, residue-level annotations. We further demonstrate that our framework is robust to benign sequence modifications routinely observed in protein engineering workflows- such as appending functional tags- a desirable property that current ML frameworks lack. Our main technical contribution is a multiple sequence alignment (MSA)-based data augmentation technique for discovering sparse residue activations within a given enzyme sequence.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.5%
14.7%
2
Nature Communications
4913 papers in training set
Top 18%
10.1%
3
Nature Methods
336 papers in training set
Top 1.0%
10.1%
4
Bioinformatics
1061 papers in training set
Top 3%
10.1%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 11%
6.3%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 4%
4.3%
7
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
4.0%
8
Scientific Reports
3102 papers in training set
Top 35%
3.7%
9
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.9%
2.4%
10
PLOS Computational Biology
1633 papers in training set
Top 14%
2.1%
11
Nature Machine Intelligence
61 papers in training set
Top 1%
2.1%
12
PLOS ONE
4510 papers in training set
Top 50%
1.9%
13
Advanced Science
249 papers in training set
Top 9%
1.9%
14
BMC Bioinformatics
383 papers in training set
Top 4%
1.9%
15
Nature Biotechnology
147 papers in training set
Top 5%
1.3%
16
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
17
Journal of Cheminformatics
25 papers in training set
Top 0.4%
1.2%
18
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
19
Genome Biology
555 papers in training set
Top 6%
0.9%
20
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
21
Communications Biology
886 papers in training set
Top 21%
0.8%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
23
Cell Reports Methods
141 papers in training set
Top 6%
0.7%
24
eLife
5422 papers in training set
Top 59%
0.7%
25
iScience
1063 papers in training set
Top 35%
0.7%
26
Cancer Research
116 papers in training set
Top 4%
0.7%
27
Genome Research
409 papers in training set
Top 5%
0.6%
28
Biophysical Journal
545 papers in training set
Top 6%
0.6%
29
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.6%
30
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 1%
0.6%