Back

Improve Protein Solubility and Activity based on Machine Learning Models

Han, X.; Ning, W.; Ma, X.; Wang, X.; Zhou, K.

2019-10-24 bioinformatics
10.1101/817890 bioRxiv
Show abstract

Improving catalytic ability of protein biocatalysts leads to reduction in the production cost of biocatalytic manufacturing process, but the search space of possible proteins/mutants is too large to explore exhaustively through experiments. To some extent, highly soluble recombinant proteins tend to exhibit high activity. Here, we demonstrate that an optimization methodology based on machine learning prediction model can effectively predict which peptide tags can improve protein solubility quantitatively. Based on the protein sequence information, a support vector machine model we recently developed was used to evaluate protein solubility after randomly mutated tags were added to a target protein. The optimization algorithm guided the tags to evolve towards variants that can result in higher solubility. Moreover, the optimization results were validated successfully by adding the tags designed by our optimization algorithm to a model protein, expressing it in vivo and experimentally quantifying its solubility and activity. For example, solubility of a tyrosine ammonium lyase was more than doubled by adding two tags to its N- and C-terminus. Its protein activity was also increased nearly 3.5 fold by adding the tags. Additional experiments also supported that the designed tags were effective for improving activity of multiple proteins and are better than previously reported tags. The presented optimization methodology thus provides a valuable tool for understanding the correlation between amino acid sequence and protein solubility and for engineering protein biocatalysts.\n\nContactkang.zhou@nus.edu.sg, chewxia@nus.edu.sg

Matching journals

The top 13 journals account for 50% of the predicted probability mass.

1
ACS Synthetic Biology
256 papers in training set
Top 0.5%
9.4%
2
Synthetic and Systems Biotechnology
10 papers in training set
Top 0.1%
6.6%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.7%
5.0%
4
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 0.3%
5.0%
5
Frontiers in Microbiology
375 papers in training set
Top 1%
5.0%
6
Scientific Reports
3102 papers in training set
Top 29%
4.1%
7
ACS Omega
90 papers in training set
Top 0.4%
3.7%
8
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
2.1%
9
Bioinformatics
1061 papers in training set
Top 7%
1.9%
10
PeerJ
261 papers in training set
Top 5%
1.9%
11
Molecules
37 papers in training set
Top 0.6%
1.9%
12
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
13
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
1.7%
50% of probability mass above
14
Nature Communications
4913 papers in training set
Top 51%
1.7%
15
PLOS ONE
4510 papers in training set
Top 52%
1.7%
16
Metabolic Engineering
68 papers in training set
Top 0.4%
1.7%
17
RSC Advances
18 papers in training set
Top 0.6%
1.7%
18
International Journal of Molecular Sciences
453 papers in training set
Top 7%
1.7%
19
Chemical Communications
24 papers in training set
Top 0.6%
1.5%
20
Frontiers in Plant Science
240 papers in training set
Top 4%
1.5%
21
Protein Science
221 papers in training set
Top 1%
1.4%
22
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.3%
23
Pharmaceuticals
33 papers in training set
Top 1%
1.0%
24
Cell Reports Physical Science
18 papers in training set
Top 0.4%
1.0%
25
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.8%
26
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.8%
27
eLife
5422 papers in training set
Top 57%
0.8%
28
Biotechnology and Bioengineering
49 papers in training set
Top 0.8%
0.8%
29
Food Chemistry
12 papers in training set
Top 0.5%
0.7%
30
Analytical Chemistry
205 papers in training set
Top 3%
0.7%