Back

A method for predicting evolved fold switchers exclusively from their sequences

Kim, A. K.; Looger, L. L.; Porter, L.

2020-02-20 bioinformatics
10.1101/2020.02.19.956805 bioRxiv
Show abstract

Although most proteins with known structures conform to the longstanding rule-of-thumb that high levels of aligned sequence identity tend to indicate similar folds and functions, an increasing number of exceptions is emerging. In spite of having highly similar sequences, these "evolved fold switchers" (1) can adopt radically different folds with disparate biological functions. Predictive methods for identifying evolved fold switchers are desirable because some of them are associated with disease and/or can perform different functions in cells. Previously, we showed that inconsistencies between predicted and experimentally determined secondary structures can be used to predict fold switching proteins (2). The usefulness of this approach is limited, however, because it requires experimentally determined protein structures, whose magnitude is dwarfed by the number of genomic proteins. Here, we use secondary structure predictions to identify evolved fold switchers from their amino acid sequences alone. To do this, we looked for inconsistencies between the secondary structure predictions of the alternative conformations of evolved fold switchers. We used three different predictors in this study: JPred4, PSIPRED, and SPIDER3. We find that overall inconsistencies are not a significant predictor of evolved fold switchers for any of the three predictors. Inconsistencies between -helix and {beta}-strand predictions made by JPred4, however, can discriminate between the different conformations of evolved fold switchers with statistical significance (p < 1.7*10-13). In light of this observation, we used these inconsistencies as a classifier and found that it could robustly discriminate between evolved fold switchers and evolved non-fold-switchers, as evidenced by a Matthews correlation coefficient of 0.90. These results indicate that inconsistencies between secondary structure predictions can indeed be used to identify evolved fold switchers from their genomic sequences alone. Our findings have implications for genomics, structural biology, and human health.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Protein Science
221 papers in training set
Top 0.1%
23.4%
2
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
7.1%
3
Scientific Reports
3102 papers in training set
Top 15%
6.6%
4
Bioinformatics
1061 papers in training set
Top 4%
5.0%
5
Biophysical Journal
545 papers in training set
Top 2%
3.2%
6
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
7
International Journal of Molecular Sciences
453 papers in training set
Top 4%
2.5%
50% of probability mass above
8
Bioinformatics Advances
184 papers in training set
Top 2%
2.5%
9
Journal of Biological Chemistry
641 papers in training set
Top 1%
2.0%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
11
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.4%
1.9%
12
Biomolecules
95 papers in training set
Top 0.4%
1.8%
13
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
1.8%
14
PLOS ONE
4510 papers in training set
Top 53%
1.7%
15
PeerJ
261 papers in training set
Top 8%
1.5%
16
Nature Communications
4913 papers in training set
Top 53%
1.5%
17
Frontiers in Bioinformatics
45 papers in training set
Top 0.3%
1.5%
18
BMC Bioinformatics
383 papers in training set
Top 5%
1.4%
19
Physical Biology
43 papers in training set
Top 1%
1.4%
20
Structure
175 papers in training set
Top 2%
1.3%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.3%
22
F1000Research
79 papers in training set
Top 3%
1.2%
23
mSphere
281 papers in training set
Top 5%
1.0%
24
Biochemistry and Biophysics Reports
28 papers in training set
Top 1%
0.9%
25
Journal of Molecular Biology
217 papers in training set
Top 3%
0.8%
26
Biochemical and Biophysical Research Communications
78 papers in training set
Top 1%
0.8%
27
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.4%
0.8%
28
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
29
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
30
ACS Omega
90 papers in training set
Top 5%
0.7%