Back

BIAPSS - BioInformatic Analysis of liquid-liquid Phase-Separating protein Sequences

Badaczewska-Dawid, A.; Potoyan, D. A.

2021-02-23 bioinformatics
10.1101/2021.02.11.430806 bioRxiv
Show abstract

Liquid-liquid phase separation (LLPS) has recently emerged as a cornerstone mechanism underlying the biogenesis of membraneless organelles (MLOs). However, a quantitative molecular grammar of protein sequences that controls the LLPS remains poorly understood. The progress in this field is hampered by the insufficiency of comprehensive databases and associated computational infrastructure for targeting biophysical and statistical analysis of phase separating biopolymers. Therefore, we have created a novel open-source web platform named BIAPSS (BioInformatic Analysis of liquid-liquid Phase-Separating protein Sequences) which contains interactive data analytic tools in combination with a comprehensive repository of bioinformatic data for on-the-fly exploration of sequence-dependent properties of proteins with known LLPS behavior. BIAPSS includes a residue-resolution biophysical analyzer for interrogating individual protein sequences (SingleSEQ tab). The latter allows users to correlate regions prone to phase separation with a large array of physicochemical attributes and various short linear motifs. BIAPSS also includes global statistics derived over the universe of most of the known LLPS-driver protein sequences (MultiSEQ tab) for revealing the regularities and sequence-specific signals driving phase separation. Finally, BIAPSS incorporates an extensive cross-reference section that links all entries to primary LLPS databases and other external resources thereby serving as a central navigation hub for the phase separation community. All of the data used by BIAPSS is freely available for download as well-formatted pre-processed data with detailed descriptions, facilitating rapid implementation in user-defined computational protocols. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=70 SRC="FIGDIR/small/430806v2_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@1d90ddorg.highwire.dtl.DTLVardef@111d084org.highwire.dtl.DTLVardef@90b3b0org.highwire.dtl.DTLVardef@51d71_HPS_FORMAT_FIGEXP M_FIG TOC - graphical abstract C_FIG Author summaryProteins, especially those with low complexity and intrinsically disordered regions, have recently come into the limelight because of mounting evidence showing that these regions can drive the formation of membraneless organelles (MLOs) in cells. The underlying physical mechanism for forming MLOs is liquid-liquid phase separation (LLPS); a thermodynamically driven process whereby a cellular milieu with a relatively well-mixed distribution of biomolecules gets decomposed into liquid droplets where the concentration of selected biomolecules is higher. Deciphering molecular sequence grammar of phase separation has turned out to be challenging because of the complexity of this process in cells and the vastness of sequence space of LLPS-driver proteins. While the field is still in its infancy the growth of experimental data has already spurred the creation of several major databases which collect and annotate bimolecular systems with confirmed LLPS behavior. What is currently missing is a framework that would leverage the existing databases by integrating them with deep biophysical and bioinformatic analysis for identifying statistically significant features of protein sequences implicated in LLPS. In this work, we have addressed this challenge by creating an open-source web platform named BIAPSS (BioInformatic Analysis of liquid-liquid Phase-Separating protein Sequences) which integrates a comprehensive repository of pre-processed bioinformatic data for LLPS-driver protein sequences with interactive analytic applications for on-the-fly analysis of biophysical features relevant for LLPS behavior. BIAPSS empowers users with novel and effective tools for exploring LLPS-related sequence signals for individual proteins (SingleSEQ tab) and globally by integrating common regularities across subgroups or the entire LLPS sequence superset (MultiSEQ). The long-term plan for BIAPSS is to serve as a unifying hub for the experimental and computational community with a comprehensive set of analytic tools, biophysically featured data, and standardized protocols facilitating the identification of sequence hot spots driving the LLPS, which all can support applications for designing new sequences of biomedical interest.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.1%
12.6%
2
Protein Science
221 papers in training set
Top 0.1%
9.2%
3
Journal of Molecular Biology
217 papers in training set
Top 0.3%
4.9%
4
Communications Chemistry
39 papers in training set
Top 0.1%
4.9%
5
Journal of Chemical Information and Modeling
207 papers in training set
Top 1.0%
4.9%
6
Nature Communications
4913 papers in training set
Top 36%
4.2%
7
International Journal of Molecular Sciences
453 papers in training set
Top 3%
3.6%
8
Journal of Proteome Research
215 papers in training set
Top 0.8%
3.6%
9
PLOS ONE
4510 papers in training set
Top 45%
2.6%
50% of probability mass above
10
Bioinformatics
1061 papers in training set
Top 6%
2.6%
11
Advanced Science
249 papers in training set
Top 8%
2.1%
12
ACS Omega
90 papers in training set
Top 1%
1.9%
13
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
14
eLife
5422 papers in training set
Top 38%
1.9%
15
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.8%
16
The Journal of Physical Chemistry Letters
58 papers in training set
Top 0.8%
1.7%
17
Chemical Science
71 papers in training set
Top 0.9%
1.7%
18
Cell Reports Methods
141 papers in training set
Top 2%
1.7%
19
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.6%
1.5%
20
iScience
1063 papers in training set
Top 17%
1.5%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.5%
22
JACS Au
35 papers in training set
Top 0.6%
1.2%
23
Patterns
70 papers in training set
Top 1%
1.2%
24
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.1%
25
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 7%
1.0%
26
Scientific Reports
3102 papers in training set
Top 71%
0.9%
27
SoftwareX
15 papers in training set
Top 0.3%
0.9%
28
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
29
Biophysical Journal
545 papers in training set
Top 5%
0.8%
30
PROTEOMICS
35 papers in training set
Top 0.7%
0.8%