Back

Machine Learning Models Reveal the Role of Ionization-Dependent Partitioning in Condensate Formation

Ozmaian, M.; Vaezzadeh, S. S.

2026-04-10 biochemistry
10.64898/2026.04.07.717090 bioRxiv
Show abstract

Biomolecular condensates form through phase separation driven by multivalent interactions in eukaryotic cells, yet the factors that control small molecule partitioning remain incompletely understood. Building on previous evidence linking hydrophobicity and solubility to condensate affinity, we applied machine learning models to evaluate the role of ionization in this process. Using RDKit molecular descriptors, we trained regularized XGBoost regressors and classifiers across four representative condensates: cGAS-DNA, SUMO-SIM, SH3-PRM, and DHH1. Inclusion of logD, a pH dependent distribution coefficient that reflects effective lipophilicity, consistently improved predictive performance compared to models using only logP or logS. SHAP analysis identified logD as the dominant contributor to model predictions, suggesting that ionization coupled partitioning governs molecular localization within condensates. The addition of three-dimensional descriptors provided no further benefit, indicating that two dimensional physicochemical features and logD are sufficient to capture the main determinants of phase separation behavior. These findings establish logD as a mechanistic link connecting ionization, hydrophobicity, and small molecule partitioning in condensates, and offer a predictive framework for understanding small molecule behavior in these dynamic environments.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 1%
18.9%
2
Nature Communications
4913 papers in training set
Top 8%
17.7%
3
Journal of the American Chemical Society
199 papers in training set
Top 1%
4.9%
4
Advanced Science
249 papers in training set
Top 3%
4.9%
5
eLife
5422 papers in training set
Top 17%
4.9%
50% of probability mass above
6
Science Advances
1098 papers in training set
Top 8%
3.3%
7
Nucleic Acids Research
1128 papers in training set
Top 7%
2.8%
8
JACS Au
35 papers in training set
Top 0.2%
2.4%
9
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
10
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.9%
11
Scientific Reports
3102 papers in training set
Top 57%
1.7%
12
Communications Chemistry
39 papers in training set
Top 0.3%
1.7%
13
iScience
1063 papers in training set
Top 17%
1.5%
14
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
15
Environmental Science & Technology
64 papers in training set
Top 2%
1.5%
16
Biophysical Journal
545 papers in training set
Top 3%
1.5%
17
PLOS ONE
4510 papers in training set
Top 56%
1.5%
18
PNAS Nexus
147 papers in training set
Top 0.5%
1.3%
19
Cell Reports
1338 papers in training set
Top 27%
1.3%
20
Cell Systems
167 papers in training set
Top 8%
1.3%
21
Chemical Science
71 papers in training set
Top 1%
1.3%
22
Biochemistry
130 papers in training set
Top 1%
1.3%
23
Journal of Molecular Biology
217 papers in training set
Top 2%
1.2%
24
Communications Biology
886 papers in training set
Top 14%
1.2%
25
The Journal of Physical Chemistry B
158 papers in training set
Top 1%
1.1%
26
Protein Science
221 papers in training set
Top 1%
0.9%
27
Journal of Biological Chemistry
641 papers in training set
Top 4%
0.8%
28
Nature Chemistry
34 papers in training set
Top 0.9%
0.8%
29
The EMBO Journal
267 papers in training set
Top 5%
0.8%
30
The Journal of Physical Chemistry Letters
58 papers in training set
Top 2%
0.7%