Machine learning uncovers circulating biomarkers and molecular heterogeneity in obesity and type 2 diabetes
Nokhoijav, E.; Kaplar, M.; Aranyi, S. C.; Berzi, A.; Bergström, G.; Antonopoulos, K.; Edfors, F.; Emri, M.; Csosz, E.
Show abstract
Obesity and Type 2 Diabetes (T2D) are heterogeneous metabolic disorders whose molecular diversity is incompletely defined. We analyzed circulating proteomic profiles from 129 individuals belonging to Control, Obesity, and T2D groups and applied complementary machine-learning approaches, including random forest, multinomial logistic regression with LASSO regularization, support vector machines, and ensemble voting to identify proteins distinguishing the clinical groups. Convergent model outputs revealed a partially overlapping panel of discriminative proteins. Model performance was evaluated in an independent dataset from the Human Protein Atlas (n=834) comprising healthy individuals, patients with Obesity, T2D, or other metabolic diseases. Unsupervised clustering further identified multiple proteomic subgroups within each clinical category, indicating substantial intragroup heterogeneity. Bootstrap random forest with null-model benchmarking highlighted stable cluster-discriminative proteins. These findings demonstrate that integrating circulating proteomics with machine learning can resolve molecular heterogeneity in Obesity and T2D and nominate candidate biomarkers for metabolic disease stratification.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.