Back

Machine learning uncovers circulating biomarkers and molecular heterogeneity in obesity and type 2 diabetes

Nokhoijav, E.; Kaplar, M.; Aranyi, S. C.; Berzi, A.; Bergström, G.; Antonopoulos, K.; Edfors, F.; Emri, M.; Csosz, E.

2026-04-20 biochemistry
10.64898/2026.04.16.718836 bioRxiv
Show abstract

Obesity and Type 2 Diabetes (T2D) are heterogeneous metabolic disorders whose molecular diversity is incompletely defined. We analyzed circulating proteomic profiles from 129 individuals belonging to Control, Obesity, and T2D groups and applied complementary machine-learning approaches, including random forest, multinomial logistic regression with LASSO regularization, support vector machines, and ensemble voting to identify proteins distinguishing the clinical groups. Convergent model outputs revealed a partially overlapping panel of discriminative proteins. Model performance was evaluated in an independent dataset from the Human Protein Atlas (n=834) comprising healthy individuals, patients with Obesity, T2D, or other metabolic diseases. Unsupervised clustering further identified multiple proteomic subgroups within each clinical category, indicating substantial intragroup heterogeneity. Bootstrap random forest with null-model benchmarking highlighted stable cluster-discriminative proteins. These findings demonstrate that integrating circulating proteomics with machine learning can resolve molecular heterogeneity in Obesity and T2D and nominate candidate biomarkers for metabolic disease stratification.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 2%
23.2%
2
Molecular & Cellular Proteomics
158 papers in training set
Top 0.3%
8.7%
3
eLife
5422 papers in training set
Top 8%
8.7%
4
Journal of Proteome Research
215 papers in training set
Top 0.9%
2.7%
5
Cell Metabolism
49 papers in training set
Top 0.6%
2.7%
6
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
7
Cell
370 papers in training set
Top 9%
2.1%
50% of probability mass above
8
Cell Reports Medicine
140 papers in training set
Top 2%
2.1%
9
Communications Biology
886 papers in training set
Top 5%
2.1%
10
Scientific Reports
3102 papers in training set
Top 52%
1.9%
11
PLOS ONE
4510 papers in training set
Top 51%
1.8%
12
mSystems
361 papers in training set
Top 4%
1.8%
13
Molecular Systems Biology
142 papers in training set
Top 0.6%
1.7%
14
Cell Reports
1338 papers in training set
Top 23%
1.7%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.7%
16
Science Translational Medicine
111 papers in training set
Top 3%
1.5%
17
EMBO Molecular Medicine
85 papers in training set
Top 2%
1.4%
18
Molecular Cell
308 papers in training set
Top 8%
1.3%
19
Nature Metabolism
56 papers in training set
Top 2%
1.3%
20
Molecular Metabolism
105 papers in training set
Top 1%
1.3%
21
Journal of Advanced Research
15 papers in training set
Top 0.4%
1.0%
22
Cell Systems
167 papers in training set
Top 10%
1.0%
23
Science Advances
1098 papers in training set
Top 25%
1.0%
24
PLOS Biology
408 papers in training set
Top 16%
0.9%
25
Genome Medicine
154 papers in training set
Top 7%
0.9%
26
Cell Genomics
162 papers in training set
Top 6%
0.8%
27
iScience
1063 papers in training set
Top 28%
0.8%
28
Nature
575 papers in training set
Top 15%
0.8%
29
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.8%
30
PROTEOMICS
35 papers in training set
Top 0.8%
0.7%