Back

Prediction of liquid-liquid phase separation proteins using machine learning

Sun, T.; Li, Q.; Xu, Y.; Zhang, Z.; Lai, L.; Pei, J.

2019-11-15 bioinformatics
10.1101/842336 bioRxiv
Show abstract

The liquid-liquid phase separation (LLPS) of bio-molecules in cell underpins the formation of membraneless organelles, which are the condensates of protein, nucleic acid, or both, and play critical roles in cellular functions. The dysregulation of LLPS might be implicated in a number of diseases. Although the LLPS of biomolecules has been investigated intensively in recent years, the knowledge of the prevalence and distribution of phase separation proteins (PSPs) is still lag behind. Development of computational methods to predict PSPs is therefore of great importance for comprehensive understanding of the biological function of LLPS. Here, a sequence-based prediction tool using machine learning for LLPS proteins (PSPredictor) was developed. Our model can achieve a maximum 10-CV accuracy of 96.03%, and performs much better in identifying new PSPs than reported PSP prediction tools. As far as we know, this is the first attempt to make a direct and more general prediction on LLPS proteins only based on sequence information.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 2%
8.5%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.8%
6.9%
3
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.9%
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.4%
5
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
4.4%
6
Scientific Reports
3102 papers in training set
Top 27%
4.4%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1.0%
4.4%
8
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.6%
9
Communications Chemistry
39 papers in training set
Top 0.1%
3.6%
10
ACS Omega
90 papers in training set
Top 0.7%
3.1%
11
PLOS ONE
4510 papers in training set
Top 43%
2.9%
50% of probability mass above
12
eLife
5422 papers in training set
Top 32%
2.6%
13
International Journal of Biological Macromolecules
65 papers in training set
Top 0.9%
2.6%
14
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.1%
15
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
2.1%
16
International Journal of Molecular Sciences
453 papers in training set
Top 5%
2.1%
17
Communications Biology
886 papers in training set
Top 8%
1.7%
18
The Journal of Physical Chemistry B
158 papers in training set
Top 1%
1.7%
19
Quantitative Biology
11 papers in training set
Top 0.4%
1.1%
20
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 7%
0.9%
21
Journal of Proteome Research
215 papers in training set
Top 2%
0.9%
22
Acta Pharmaceutica Sinica B
11 papers in training set
Top 0.8%
0.8%
23
Cell Reports Physical Science
18 papers in training set
Top 0.6%
0.8%
24
Journal of Molecular Biology
217 papers in training set
Top 3%
0.8%
25
Nature Communications
4913 papers in training set
Top 62%
0.8%
26
National Science Review
22 papers in training set
Top 2%
0.8%
27
Journal of Structural Biology
58 papers in training set
Top 2%
0.8%
28
The Journal of Physical Chemistry Letters
58 papers in training set
Top 2%
0.8%
29
iScience
1063 papers in training set
Top 31%
0.8%
30
Science of The Total Environment
179 papers in training set
Top 5%
0.8%