Back

RFRSN: Improving protein fold recognition by siamese network

Ke, H.; liu, y.; jun, Y. D.

2021-04-28 bioinformatics
10.1101/2021.04.27.441698 bioRxiv
Show abstract

Protein fold recognition is the key to study protein structure and function. As a representative pattern recognition task, there are two main categories of approaches to improve the protein fold recognition performance: 1) extracting more discriminative descriptors, and 2) designing more effective distance metrics. The existing protein fold recognition approaches focus on the first category to finding a robust and discriminative descriptor to represent each protein sequence as a compact feature vector, where different protein sequence is expected to be separated as much as possible in the fold space. These methods have brought huge improvements to the task of protein fold recognition. However, so far, little attention has been paid to the second category. In this paper, we focus not only on the first category, but also on the second point that how to measure the similarity between two proteins more effectively. First, we employ deep convolutional neural network techniques to extract the discriminative fold-specific features from the potential protein residue-residue relationship, we name it SSAfold. On the other hand, due to different feature representation usually subject to varying distributions, the measurement of similarity needs to vary according to different feature distributions. Before, almost all protein fold recognition methods perform the same metrics strategy on all the protein feature ignoring the differences in feature distribution. This paper presents a new protein fold recognition by employing siamese network, we named it PFRSN. The objective of PFRSN is to learns a set of hierarchical nonlinear transformations to project protein pairs into the same fold feature subspace to ensure the distance between positive protein pairs is reduced and that of negative protein pairs is enlarged as much as possible. The experimental results show that the results of SSAfold and PFRSN are highly competitive.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
18.8%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.1%
17.6%
3
BMC Bioinformatics
383 papers in training set
Top 1%
6.9%
4
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
6.9%
50% of probability mass above
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
6
Journal of Computational Biology
37 papers in training set
Top 0.1%
3.3%
7
PLOS ONE
4510 papers in training set
Top 48%
2.1%
8
Scientific Reports
3102 papers in training set
Top 50%
2.1%
9
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
1.9%
10
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.9%
11
Journal of Molecular Biology
217 papers in training set
Top 1%
1.8%
12
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.7%
13
Frontiers in Genetics
197 papers in training set
Top 6%
1.3%
14
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.3%
15
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
1.2%
16
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.2%
17
Molecules
37 papers in training set
Top 1%
1.1%
18
Frontiers in Bioinformatics
45 papers in training set
Top 0.6%
0.9%
19
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.9%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
21
BioMed Research International
25 papers in training set
Top 3%
0.8%
22
BioData Mining
15 papers in training set
Top 0.8%
0.8%
23
Bioengineering
24 papers in training set
Top 2%
0.7%
24
Neurocomputing
13 papers in training set
Top 0.7%
0.6%
25
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%
26
Expert Systems with Applications
11 papers in training set
Top 0.7%
0.5%
27
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 1%
0.5%
28
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.5%
29
Communications Biology
886 papers in training set
Top 32%
0.5%