Back

Cross-platform metabolomics imputation using importance-weighted autoencoders

Smith, A.; Pinto, R.; Zagkos, L.; Tzoulaki, I.; Elliott, P.; Dehghan, A.

2025-03-06 epidemiology
10.1101/2025.03.06.25323475 medRxiv
Show abstract

BackgroundMetabolomics data are often generated through different analytical platforms and different methods of identification and quantification which makes their synthesis and large-scale replication challenging. To address this, we applied generative deep learning to impute metabolites assayed by Metabolon, a commonly used commercial platform, using metabolomic features acquired by an untargeted liquid chromatography-mass spectrometry (LC-MS) platform. MethodsWe utilised a subset of 979 samples from the Airwave Health Monitoring Study which were assayed by both Metabolon and National Phenome Centre at Imperial College (NPC) LC-MS assays to develop an ensemble of importance-weighted autoencoders (IWAEs) which can perform cross-platform metabolomics imputation between the two assays. Using the ensemble, we generated a Metabolon equivalent dataset in 2,971 additional Airwave samples that lacked prior Metabolon measurements. We conducted observational associations with two clinical outcomes, body mass index (BMI) and C-reactive protein (CRP). We validated the ensemble and imputed data by investigating the concordance of the observational associations. This was done using both the imputed Metabolon dataset and the measured metabolite levels by Metabolon, and NPC in the Airwave study and Nightingale platform in the UK Biobank. ResultsOur imputation ensemble generated samples highly correlated with their real values across all Metabolon metabolites within a held-out test set with a mean sample correlation of 0.61 (IQR 0.55-0.67). The well-imputed subset included 199 (22%) of the metabolites present in the real Metabolon dataset where the imputed values accounted for at least 55% of the original variance (R2 [≥] 0.55) and a minimal uncertainty (R2 variance [≤] 0.025). The subset included 43 metabolites not previously identified within our LC-MS platform. When comparing the associations of the real and imputed Metabolon metabolites with BMI and CRP, the standardised beta-coefficients were highly correlated ({rho} = 0.93 for BMI and 0.89 for CRP) with minimal mean difference (0.005 (0.04) for BMI, 0.005 (0.04) for CRP). Similar concordance occurred between the imputed Metabolon metabolites and equivalent UK Biobank (mean difference -0.007 (0.05) for BMI, 0.01 (0.04) for CRP) and our LC-MS platform (mean difference -0.013 (0.04) for BMI, -0.019 (0.04) for CRP). ConclusionThis methodological innovation offers a scalable and accurate method for cross-platform imputation which could allow for to aggregate individual-level metabolomics data from different epidemiological studies, replication findings or conduct meta-analyses.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.