Enzyme Classification via Semi-Supervised Functional ResidueLearning
Gong, C.; Zhang, D.; Ouyang-Zhang, J.; Liu, Q.; Klivans, A.; Diaz, D.
Show abstract
Predicting enzymatic function from a protein sequence is a fundamental task in protein discovery and engineering. In this paper, we present Semi-supervised Learning for Enzyme Classification (SLEEC): a semi-supervised learning framework that learns a function-aware protein representation for Enzyme Commision (EC) number prediction. SLEEC achieves SOTA performance on standard bench-marks and provides interpretable, residue-level annotations. We further demonstrate that our framework is robust to benign sequence modifications routinely observed in protein engineering workflows- such as appending functional tags- a desirable property that current ML frameworks lack. Our main technical contribution is a multiple sequence alignment (MSA)-based data augmentation technique for discovering sparse residue activations within a given enzyme sequence.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.