Back

Prediction of 8-state protein secondary structures by 1D-Inception and BD-LSTM

Ratul, M. A. R.; Turcotte, M.; Mozaffari, M. H.; Lee, W.

2019-12-11 bioinformatics
10.1101/871921 bioRxiv
Show abstract

Protein secondary structure is crucial to create an information bridge between the primary structure and the tertiary (3D) structure. Precise prediction of 8-state protein secondary structure (PSS) significantly utilized in the structural and functional analysis of proteins in bioinformatics. In this recent period, deep learning techniques have been applied in this research area and raise the Q8 accuracy remarkably. Nevertheless, from a theoretical standpoint, there still lots of room for improvement, specifically in 8-state (Q8) protein secondary structure prediction. In this paper, we presented two deep learning architecture, namely 1D-Inception and BD-LSTM, to improve the performance of 8-classes PSS prediction. The input of these two architectures is a carefully constructed feature matrix from the sequence features and profile features of the proteins. Firstly, 1D-Inception is a Deep convolutional neural network-based approach that was inspired by the InceptionV3 model and containing three inception modules. Secondly, BD-LSTM is a recurrent neural network model which including bidirectional LSTM layers. Our proposed 1D-Inception method achieved 76.65%, 71.18%, 76.86%, and 74.07% Q8 accuracy respectively on benchmark CullPdb6133, CB513, CASP10, and CASP11 datasets. Moreover, BD-LSTM acquired 74.71%, 69.49%, 74.07%, and 72.37% state-8 accuracy after evaluated on CullPdb6133, CB513, CASP10, and CASP11 datasets, respectively. Both these architectures enable the efficient processing of local and global interdependencies between amino acids to make an accurate prediction of each class is very beneficial in the deep neural network. To the best of our knowledge, experiment results of the 1D-Inception model demonstrate that it outperformed all the state-of-art methods on the benchmark CullPdb6133, CB513, and CASP10 datasets.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Briefings in Bioinformatics
326 papers in training set
Top 0.1%
22.9%
2
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
6.5%
3
Journal of Chemical Information and Modeling
207 papers in training set
Top 1.0%
4.9%
4
Bioinformatics
1061 papers in training set
Top 5%
4.4%
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
6
Computers in Biology and Medicine
120 papers in training set
Top 0.8%
3.6%
7
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
3.6%
8
Scientific Reports
3102 papers in training set
Top 42%
2.9%
50% of probability mass above
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.8%
10
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
11
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.9%
12
Molecules
37 papers in training set
Top 0.7%
1.8%
13
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1.0%
1.7%
14
PLOS ONE
4510 papers in training set
Top 53%
1.7%
15
Quantitative Biology
11 papers in training set
Top 0.3%
1.4%
16
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.6%
1.2%
17
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.0%
18
Journal of Computational Biology
37 papers in training set
Top 0.4%
1.0%
19
Informatics in Medicine Unlocked
21 papers in training set
Top 0.8%
1.0%
20
ACS Omega
90 papers in training set
Top 3%
0.9%
21
Biomolecules
95 papers in training set
Top 2%
0.8%
22
BioMed Research International
25 papers in training set
Top 3%
0.8%
23
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.8%
24
International Journal of Biological Macromolecules
65 papers in training set
Top 3%
0.8%
25
Journal of Structural Biology
58 papers in training set
Top 2%
0.8%
26
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.6%
0.8%
27
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
28
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
29
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
30
Communications Biology
886 papers in training set
Top 25%
0.7%