Back

Decoding heterogeneous aging clocks and disease risk stratification using a metabolomic foundation model

Xu, Y.; Zou, B.; Xie, G.; Jia, W.; Zhang, L.

2026-05-20 bioinformatics
10.64898/2026.05.18.725977 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWMetabolomic aging clocks estimate biological age by modeling metabolite concentrations, thereby capturing aging signals from healthspan and adverse outcomes. However, existing clocks generally assume homogeneous aging trajectories and yield only a single age acceleration metric, limiting their capacity to capture inter-individual metabolic heterogeneity and characterize nuanced individual-level representations. To address these limitations, we proposed MetFoundation, a metabolomic foundation model pre-trained on nuclear magnetic resonance (NMR) metabolomic profiles from over 430,000 participants in UK Biobank via self-supervised learning. This large-scale pre-training enables MetFoundation to learn a metabolomic representation space that captures the complex, non-linear structure of systemic metabolism as reflected in NMR data. Building on MetFoundation, we developed a mortality-informed metabolomic aging clock by fine-tuning an attached survival module, deriving age acceleration that demonstrates significant associations with multiple age-related diseases and factors. More importantly, we utilized embeddings generated by MetFoundation to identify metabolic subtypes, resulting in 13 distinct subtypes with differential susceptibility profiles for major age-related diseases, particularly dementia and diabetes. This finding empirically demonstrated profound metabolic heterogeneity across populations, persisting even at comparable levels of age acceleration. To enhance clinical applicability, we further employed contrastive learning to distill a lightweight model that approximates the learned metabolomic representation space using only routine blood test measurements as inputs. Both hold-out testing within UK Biobank and the external validation in China Health and Retirement Longitudinal Study replicated similar disease onset patterns across the identified subtypes, underscoring the robust generalizability of MetFoundation and the translational potential of the discovered metabolic subtypes.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 6%
18.4%
2
Nature Aging
51 papers in training set
Top 0.1%
12.2%
3
Advanced Science
249 papers in training set
Top 2%
8.3%
4
Aging Cell
144 papers in training set
Top 1.0%
6.3%
5
Nature Metabolism
56 papers in training set
Top 0.4%
4.8%
50% of probability mass above
6
Nature Medicine
117 papers in training set
Top 0.6%
4.3%
7
Nature
575 papers in training set
Top 8%
3.2%
8
Nature Machine Intelligence
61 papers in training set
Top 1%
3.0%
9
Science Advances
1098 papers in training set
Top 10%
2.7%
10
Cell Systems
167 papers in training set
Top 5%
2.4%
11
Cell Genomics
162 papers in training set
Top 3%
2.1%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 28%
2.1%
13
npj Aging
15 papers in training set
Top 0.5%
1.7%
14
eLife
5422 papers in training set
Top 43%
1.7%
15
PLOS Computational Biology
1633 papers in training set
Top 17%
1.6%
16
Cell Reports
1338 papers in training set
Top 25%
1.6%
17
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
18
Communications Biology
886 papers in training set
Top 15%
1.2%
19
Scientific Reports
3102 papers in training set
Top 69%
0.9%
20
Nucleic Acids Research
1128 papers in training set
Top 16%
0.9%
21
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
22
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
23
Bioinformatics
1061 papers in training set
Top 10%
0.7%
24
Patterns
70 papers in training set
Top 3%
0.7%
25
Genome Medicine
154 papers in training set
Top 9%
0.7%
26
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%