Back

SSPSPredictor: A Sequence and Structure based Deep Learning Model for Predicting Phase-Separating Proteins

Wang, T.; Liao, S.; Qi, Y.; Zhang, Z.

2026-04-01 bioinformatics
10.64898/2026.03.30.715224 bioRxiv
Show abstract

Liquid-liquid phase separation (LLPS) underlies the formation of biomolecular liquid condensates (also referred to membraneless organelles, MLOs), which are essential for spatially organizing various biochemical processes within cells. Proteins that play a key role in driving condensates formation are termed phase-separating proteins (PSPs). Given experimental identification of PSPs remains labor-intensive and time-consuming, multiple computational tools have been developed based on empirical features or deep learning. In this study, we propose SSPSPredictor, a novel multimodal predictive model for PSPs with folded or intrinsically disordered structures, leveraging the fusion of sequence information from a protein language model ESM-2 and structural insights from a graph neural network GVP. Compared with existing tools, SSPSPredictor achieves balanced performance in identifying endogenous PSPs, predicting relative LLPS propensities, and recognizing key regions that drive LLPS. Moreover, SSPSPredictor exhibits good interpretability in identifying driving regions along protein sequences, although no relevant supervision was provided during training. Further predictive analysis of the human proteome using SSPSPredictor reveals that the proportion of intrinsically disordered proteins (IDPs) undergoing LLPS is significantly higher than that of folded proteins. In addition, pathogenic variants, especially those located in disordered regions, exhibit higher LLPS propensity than other mutations, uncovering a link between LLPS and diseases at the amino acid level.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 0.1%
23.6%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.3%
10.6%
3
Nature Communications
4913 papers in training set
Top 24%
7.5%
4
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 1%
5.1%
5
Bioinformatics
1061 papers in training set
Top 5%
3.8%
50% of probability mass above
6
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
3.8%
7
Nature Machine Intelligence
61 papers in training set
Top 1%
3.2%
8
Scientific Reports
3102 papers in training set
Top 49%
2.2%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 27%
2.2%
10
Cell Systems
167 papers in training set
Top 6%
2.0%
11
iScience
1063 papers in training set
Top 11%
2.0%
12
Journal of Molecular Biology
217 papers in training set
Top 1%
2.0%
13
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.9%
1.8%
14
Communications Biology
886 papers in training set
Top 7%
1.8%
15
Communications Chemistry
39 papers in training set
Top 0.3%
1.6%
16
Patterns
70 papers in training set
Top 1%
1.3%
17
eLife
5422 papers in training set
Top 50%
1.2%
18
Science Bulletin
22 papers in training set
Top 0.6%
0.9%
19
Journal of Proteome Research
215 papers in training set
Top 2%
0.8%
20
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.8%
21
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
22
Genome Research
409 papers in training set
Top 5%
0.7%
23
National Science Review
22 papers in training set
Top 3%
0.7%
24
PLOS Computational Biology
1633 papers in training set
Top 27%
0.7%
25
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
26
Nature Computational Science
50 papers in training set
Top 2%
0.7%
27
PLOS ONE
4510 papers in training set
Top 71%
0.7%
28
Quantitative Biology
11 papers in training set
Top 1.0%
0.5%
29
Cell Research
49 papers in training set
Top 3%
0.5%
30
Cell Reports Physical Science
18 papers in training set
Top 1%
0.5%