Back

Reimplementing Unirep in JAX

Ma, E.; Kummer, A.

2020-05-13 bioengineering
10.1101/2020.05.11.088344 bioRxiv
Show abstract

UniRep is a recurrent neural network model trained on 24 million protein sequences, and has shown utility in protein engineering. The original model, however, has rough spots in its implementation, and a convenient API is not available for certain tasks. To rectify this, we reimplemented the model in JAX/NumPy, achieving near-100X speedups in forward pass performance, and implemented a convenient API for specialized tasks. In this article, we wish to document our model reimplementation process with the goal of educating others interested in learning how to dissect a deep learning model, and engineer it for robustness and ease of use.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.1%
10.1%
2
Bioinformatics
1061 papers in training set
Top 3%
8.5%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
7.2%
4
Protein Science
221 papers in training set
Top 0.1%
6.9%
5
PLOS ONE
4510 papers in training set
Top 27%
6.4%
6
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
4.9%
7
PeerJ
261 papers in training set
Top 2%
3.6%
8
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
3.3%
50% of probability mass above
9
Nature Machine Intelligence
61 papers in training set
Top 1%
2.9%
10
Journal of Open Source Software
22 papers in training set
Top 0.1%
2.6%
11
Journal of Molecular Biology
217 papers in training set
Top 0.9%
2.6%
12
Scientific Reports
3102 papers in training set
Top 50%
2.1%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.9%
14
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.8%
15
ACS Omega
90 papers in training set
Top 2%
1.7%
16
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.5%
17
Nature Communications
4913 papers in training set
Top 53%
1.5%
18
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
19
SoftwareX
15 papers in training set
Top 0.2%
1.3%
20
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
21
Structure
175 papers in training set
Top 2%
1.2%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
23
GigaScience
172 papers in training set
Top 2%
1.2%
24
BMC Genomics
328 papers in training set
Top 4%
1.1%
25
eLife
5422 papers in training set
Top 51%
1.0%
26
F1000Research
79 papers in training set
Top 3%
1.0%
27
Neuroinformatics
40 papers in training set
Top 0.8%
0.9%
28
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
29
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
30
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%