Machine Learning Models Reveal the Role of Ionization-Dependent Partitioning in Condensate Formation
Ozmaian, M.; Vaezzadeh, S. S.
Show abstract
Biomolecular condensates form through phase separation driven by multivalent interactions in eukaryotic cells, yet the factors that control small molecule partitioning remain incompletely understood. Building on previous evidence linking hydrophobicity and solubility to condensate affinity, we applied machine learning models to evaluate the role of ionization in this process. Using RDKit molecular descriptors, we trained regularized XGBoost regressors and classifiers across four representative condensates: cGAS-DNA, SUMO-SIM, SH3-PRM, and DHH1. Inclusion of logD, a pH dependent distribution coefficient that reflects effective lipophilicity, consistently improved predictive performance compared to models using only logP or logS. SHAP analysis identified logD as the dominant contributor to model predictions, suggesting that ionization coupled partitioning governs molecular localization within condensates. The addition of three-dimensional descriptors provided no further benefit, indicating that two dimensional physicochemical features and logD are sufficient to capture the main determinants of phase separation behavior. These findings establish logD as a mechanistic link connecting ionization, hydrophobicity, and small molecule partitioning in condensates, and offer a predictive framework for understanding small molecule behavior in these dynamic environments.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.