Metabolites — Latest Matching Preprints

1

Quantitative biomarker profiling of serum samples in the By-Band-Sleeve trial

Smith, M. l.; Goudswaard, L. J.; Hughes, D. A.; Blazeby, J. M.; Rogers, C. A.; Mazza, G.; Gidman, E. A.; Fitzgibbon, S.; Groom, A.; Ring, S. M.; Timpson, N. J.; Corbin, L. J.

2025-11-13 endocrinology 10.1101/2025.11.11.25339993 medRxiv

Top 0.1%

68.6%

Show abstract

Metabolomics data has been generated via proton nuclear magnetic resonance (NMR) spectroscopy in samples collected within By-Band-Sleeve. Two sample collection efforts were made - firstly, from a randomised controlled trial (RCT) comparing the effectiveness of three types of bariatric surgery: the Roux-en-Y gastric bypass ("bypass"), laparoscopic adjustable gastric band ("band") and the sleeve gastrectomy ("sleeve"), and secondly from a non-randomised (observational) study of bariatric surgery. In both instances, samples were collected from patients before and after surgery. Data underwent quality control (QC) using a standard pipeline via the R package metaboprep. This package extracts data from preformed worksheets, provides summary statistics and enables the user to select samples and metabolites for their analysis based on a set of quality metrics. Post-filtering, the dataset consists of data from 1410 samples (999 pre-surgery, 411 post-surgery) from 1045 unique individuals (1000 from the RCT and 45 from the non-randomised study), each with 250 measured metabolic traits. Comparison of NMR measures to clinical chemistry data showed good agreement for the metabolites in common across both datasets. Concordance with previous NMR data generated for a subset of the same samples was largely good. Overall, this data note describes the data, explains the pre-processing and quality control procedures applied to the data, and provides some data validation analyses.

2

Predicting the Pathway Involvement of Metabolites in Both Pathway Categories and Individual Pathways

Huckvale, E. D.; Moseley, H. N. B.

2024-08-09 systems biology 10.1101/2024.08.07.607025 medRxiv

Top 0.1%

52.2%

Show abstract

Metabolism is the network of chemical reactions that sustain cellular life. Parts of this metabolic network are defined as metabolic pathways containing specific biochemical reactions. Products and reactants of these reactions are called metabolites, which are associated with certain human-defined metabolic pathways. Metabolic knowledgebases, such as the Kyoto Encyclopedia of Gene and Genomes (KEGG) contain metabolites, reactions, and pathway annotations; however, such resources are incomplete due to current limits of metabolic knowledge. To fill in missing metabolite pathway annotations, past machine learning models showed some success at predicting KEGG Level 2 pathway category involvement of metabolites based on their chemical structure. Here, we present the first machine learning model to predict metabolite association to more granular KEGG Level 3 metabolic pathways. We used a feature and dataset engineering approach to generate over one million metabolite-pathway entries in the dataset used to train a single binary classifier. This approach produced a mean Matthews correlation coefficient (MCC) of 0.806 {+/-} 0.017 SD across 100 cross-validations iterations. The 172 Level 3 pathways were predicted with an overall MCC of 0.726. Moreover, metabolite association with the 12 Level 2 pathway categories were predicted with an overall MCC of 0.891, representing significant transfer learning from the Level 3 pathway entries. These are the best metabolite-pathway prediction results published so far in the field.

3

An ensemble method associates prepregnancy BMI and maternal ethnicity with key cord blood metabolomic changes in a multi-ethnic cohort from Hawaii

Tao, L.; Li, B.; Du, Y.; Hung, S.; Garmire, L.

2025-08-16 endocrinology 10.1101/2025.08.14.25333702 medRxiv

Top 0.1%

40.0%

Show abstract

Maternal obesity poses significant risks to fetal health, influencing metabolomic profiles in newborn cord blood. Despite the growing application of metabolomics, limited research has explored how BMI-associated metabolite alterations may vary across different ethnic groups. We analyzed metabolomic data from a multi-ethnic cohort of 87 participants, including Native Hawaiian and Pacific Islander (NHPI) individuals. We used an ensemble machine learning model with a meta-learner to predict cord blood metabolomic changes associated with maternal BMI, the continuous obesity metric. The meta-learner integrated linear and nonlinear approaches and achieved significantly enhanced performance compared to the baseline linear regression model. In cord blood samples, glycine, serine, and threonine metabolism are activated by maternal obesity, while fatty acid biosynthesis and biosynthesis of unsaturated fatty acids are repressed. Some metabolites associated with these pathways show ethnicity-specific patterns. Compared to Asians and caucasians, 1,5-anhydrosorbitol, glycine, L-threonine show a unique increase from normal to obese maternally associated groups in NHPI, while PC(O-44:6) is significantly decreased in NHPI. The finding reveals the impact of maternal obesity on offspring health, and calls on future research to investigate the maternal and newborn health in underrepresented populations, such as NHPI.

4

Untargeted metabolome- and transcriptome-wide association study identifies causal genes modulating metabolite concentrations in urine

Sonmez Flitman, R.; Khalili, B.; Rueedi, R.; Kutalik, Z.; Bergmann, S.

2020-05-25 genomics 10.1101/2020.05.22.110197 medRxiv

Top 0.1%

39.9%

Show abstract

In this study we investigate the results of a metabolome- and transcriptome-wide association study to identify genes influencing the human metabolome. We used RNAseq data from lymphoblastoid cell lines (LCLs) derived from 555 Caucasian individuals to characterize their transcriptome. As for the metabolome we took an untargeted approach using binned features from 1H nuclear magnetic resonance spectroscopy (NMR) of urine samples from the same subjects allowing for data-driven discovery of associated compounds (rather than working with a limited set of quantified metabolites). Using pairwise linear regression we identified 21 study-wide significant associations between metabolome features and gene expression levels. We observed the most significant association between the gene ALMS1 and two adjacent metabolome features at 2.0325 and 2.0375 ppm. By using our previously developed metabomatching methodology, we found N-Acetylaspartate (NAA) as the potential underlying metabolite whose urine concentration is correlated with ALMS1 expression. Indeed, a number of metabolome- and genome-wide association studies (mGWAS) had already suggested the locus of this gene to be involved in regulation of N-acetylated compounds, yet were not able to identify unambiguously the exact metabolite, nor to disambiguate between ALMS1 and NAT8, another gene found in the same locus as the mediator gene. The second highest significant association was observed between HPS1 and two metabolome features at 2.8575 and 2.8725 ppm. Metabomatching of the association profile of HPS1 with all metabolite features pointed at trimethylamine (TMA) as the most likely underlying metabolite. mGWAS had previously implicated a locus containing HPS1 to be associated with TMA concentrations in urine but could not disambiguate this association signal from PYROXD2, a gene in the same locus. We used Mendelian randomization to show for both ALMS1 and HPS1 that their expression is causally linked to the respective metabolite concentrations. Our study provides evidence that the integration of metabolomics with gene expression data can support mQTL analysis, helping to identify the most likely gene involved in the modulation of the metabolite concentration.

5

Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase

Huckvale, E. D.; Moseley, H. N. B.

2024-11-11 systems biology 10.1101/2024.11.07.622563 medRxiv

Top 0.1%

38.3%

Show abstract

MotivationPathway annotations of non-macromolecular (relatively small) biomolecules facilitate biological and biomedical interpretation of metabolomics datasets. However, low pathway annotation levels of detected biomolecules hinder this type of interpretation. Thus, predicting the pathway involvement of detected but unannotated biomolecules has high potential to improve metabolomics data analysis and omics integration. Past publications have only made use of the Kyoto Encyclopedia of Genes and Genomes derived datasets to develop machine learning models to predict pathway involvement. However, to our knowledge, the Reactome knowledgebase has not been utilized to develop these types of predictive models. ResultsWe created a dataset ready for machine learning using chemical representations of all path-way-annotated compounds available from the Reactome knowledgebase. Next, we trained and evaluated a single multilayer perceptron binary classifier using combined metabolite-pathway paired feature vectors engineered from this new dataset. While models trained on a prior corresponding KEGG dataset with 502 pathways scored a mean Matthews correlation coefficient (MCC) of 0.847 and 0.0098 standard deviation, the models trained on the Reactome dataset with 3,985 pathways demonstrated improved performance with a mean MCC of 0.916, but with a higher 0.0149 standard deviation. These results indicate that the pathways in Reactome can also be effectively predicted, greatly increasing the number of human-defined pathways available for prediction. AvailabilityCode and data for fully reproducing the results in this work are available at https://doi.org/10.6084/m9.figshare.27478065. Contacthunter.moseley@uky.edu.

6

Measurement of 24-hour Continuous Human CH4 Release in a Whole Room Indirect Calorimeter

Alvarez Carnero, E.; Bock, C. P.; Liu, Y.; Corbin, K.; Wohlers-Kariesch, E.; Ruud, K.; Moon, J.; Marcus, A.; Rosa, K.-B.; Muraviev, A.; Vodopyanov, K. L.; Smith, S. R.

2022-11-30 endocrinology 10.1101/2022.11.04.22281777 medRxiv

Top 0.1%

34.8%

Show abstract

We describe the technology and validation of a new whole room indirect calorimeter (WRIC) methodology to quantify methane (CH4) released from the human body over 24h concurrently with the assessment of energy expenditure and substrate utilization. The new system extends the assessment of energy metabolism by adding CH4, a downstream product of microbiome fermentation that could contribute to energy balance. MethodsOur new system consists of an established whole room indirect calorimeter WRIC combined with the addition of off-axis integrated-cavity output spectroscopy (OA-ICOS) to measure CH4 concentrations ([CH4]). The volume of CH4 released (VCH4) was calculated after measuring air flow rates. Development and validation included environmental experiments to measure the stability of the atmospheric [CH4], infusing CH4 into the WRIC and cross-validation studies comparing [CH4] quantified by OA-ICOS and mid-infrared dual-comb spectroscopy (MIR DCS). Reliability of the whole system is reported between years, weeks, days, and validated CH4 infusions. The cross-validation and reliability of VCH4 released from the human body was determined in 19 participants on consecutive days. In addition, we describe a postprocessing analytical method to differentiate CH4 released from breath versus intestine by matching times of stool production and contemporaneous VCH4 release. ResultsOur infusion data indicated that the system measured 24h [CH4] and VCH4 with high sensitivity, reliability and validity. Cross-validation studies showed good agreement between OA-ICOS and MIR DCS technologies (r= 0.979, P<0.0001). Initial human data revealed 24h VCH4 was highly variable between subjects and within / between days; this highlights the importance of a 24-h continuous assessment to have a complete picture of VCH4 release. Finally, our method to quantify VCH4 released by breath or colon suggested that over 50% of the CH4 was eliminated through the breath. ConclusionsThe method allows, for the first time, measurement of 24h VCH4 (in kcal) and therefore the measurement of the proportion of human energy intake fermented to CH4 by the gut microbiome and released via breath or directly from the intestine. Our method is accurate, valid, and will provide meaningful data to understand not only interindividual variation, but also allows us to track the effects of dietary, probiotic, bacterial and fecal microbiota transplantation on VCH4.

7

Gazing into the Metaboverse: Automated exploration and contextualization of metabolic data

Berg, J. A.; Zhou, Y.; Waller, T. C.; Ouyang, Y.; Nowinski, S. M.; Van Ry, T.; George, I.; Cox, J. E.; Wang, B.; Rutter, J.

2020-09-28 biochemistry 10.1101/2020.06.25.171850 medRxiv

Top 0.1%

32.8%

Show abstract

Metabolism forms a complex, interdependent network, and perturbations can have indirect effects that are pervasive. Identifying these patterns and their consequences is difficult, particularly when the effects occur across canonical pathways, and these difficulties have long acted as a bottleneck in metabolic data analysis. This challenge is compounded by technical limitations in metabolomics approaches that garner incomplete datasets. Current network-based tools generally utilize pathway-level analysis lacking the granular resolution required to provide context into the effects of all perturbations, regardless of magnitude, across the metabolic network. To address these shortcomings, we introduce algorithms that allow for the real-time extraction of regulatory patterns and trends from user data. To minimize the impact of missing measurements within the metabolic network, we introduce methods that enable complex pattern recognition across multiple reactions. These tools are available interactively within the user-friendly Metaboverse app (https://github.com/Metaboverse) to facilitate exploration and hypothesis generation. We demonstrate that expected signatures are accurately captured by Metaboverse. Using public lung adenocarcinoma data, we identify a previously undescribed multi-dimensional signature that correlated with survival outcomes in lung adenocarcinoma patients. Using a model of respiratory deficiency, we identify relevant and previously unreported regulatory patterns that suggest an important compensatory role for citrate during mitochondrial dysfunction. This body of work thus demonstrates that Metaboverse can identify and decipher complex signals from data that have been otherwise difficult to identify with previous approaches.

8

Bayesian statistics improves biological interpretability of metabolomics data from human cohorts

Brydges, C.; Che, X.; Lipkin, W. I.; Fiehn, O.

2022-05-19 bioinformatics 10.1101/2022.05.17.492312 medRxiv

Top 0.1%

29.5%

Show abstract

BackgroundUnivariate analyses of metabolomics data currently follow a frequentist approach, using p-values to reject a null-hypothesis. However, the usability of p-values is plagued by many misconceptions and inherent pitfalls. We here propose the use of Bayesian statistics to quantify evidence supporting different hypotheses and discriminate between the null hypothesis versus lack of statistical power. MethodsWe use metabolomics data from three independent human cohorts that studied plasma signatures of subjects with myalgic encephalomyelitis / chronic fatigue syndrome (ME/CFS). Data are publicly available, covering 84-197 subjects in each study with 562-888 identified metabolites of which 777 were common between two studies, and 93 compounds reported in all three studies. By comparing results from classic multiple regression against Bayesian multiple regression we show how Bayesian statistics incorporates results from one study as prior information into the next study, thereby improving the overall assessment of the likelihood of finding specific differences between plasma metabolite levels and disease outcomes in ME/CFS. ResultsWhereas using classic statistics and Benjamini-Hochberg FDR-corrections, study 1 detected 18 metabolic differences, study 2 detected no differences. Using Bayesian statistics on the same data, we found a high likelihood that 97 compounds were altered in concentration in study 2, after using the results of study 1 as prior distributions. These findings included lower levels of peroxisome-produced ether-lipids, higher levels of long chain, unsaturated triacylglycerides, and the presence of exposome compounds that are explained by difference in diet and medication between healthy subjects and ME/CFS patients. Although study 3 reported only 92 reported compounds in common with the other two studies, these major differences were confirmed. We also found that prostaglandin F2alpha, a lipid mediator of physiological relevance, was significantly reduced in ME/CFS patients across all three studies. ConclusionsThe use of Bayesian statistics led to biological conclusions from metabolomic data that were not found through the frequentist analytical approaches more commonly employed. We propose that Bayesian statistics to be highly useful for studies with similar research designs if similar metabolomic assays are used.

9

Characterisation of fasting and postprandial NMR metabolites: insights from the ZOE PREDICT 1 Study

Bermingham, K. M.; Mazidi, M.; Franks, P. W.; Maher, T.; Valdes, A. M.; Linenberg, I.; Wolf, J.; Hadjigeorgiou, G.; Spector, T. D.; Menni, C.; Ordovas, J. M.; Berry, S. E.; Hall, W. L.

2022-11-15 biochemistry 10.1101/2022.11.14.516406 medRxiv

Top 0.1%

29.0%

Show abstract

BackgroundPostprandial metabolomic profiles and their inter-individual variability are not well characterised. Here we describe postprandial metabolite changes, their correlations with fasting values and their inter- and intra-individual variability following a standardised meal in the ZOE PREDICT 1 cohort. MethodsIn the ZOE PREDICT 1 study (n = 1,002 (NCT03479866)), 250 metabolites, mainly lipids, were measured by Nightingale NMR panel in fasting and postprandial (4 and 6 h after a 3.7 MJ mixed nutrient meal, with a second 2.2 MJ mixed nutrient meal at 4 h) serum samples. For each metabolite, inter- and intra-individual variability over-time was evaluated using linear mixed modelling and intraclass-correlation coefficients (ICC) calculated. ResultsPostprandially, 85% (of 250 metabolites) significantly changed from fasting at 6h (47% increased, 53% decreased; Kruskal-Wallis), with 37 measures increasing by >25%, and 14 increasing by >50%. The largest changes were observed in very large lipoprotein particles and ketone bodies. Seventy-one percent of circulating metabolites were strongly correlated (Spearmans rho >0.80) between fasting and postprandial timepoints, and 5% were weakly correlated (rho <0.50). The median ICC of the 250 metabolites was 0.91 (range 0.08-0.99). The lowest ICCs (ICC<0.40, 4% of measures) were found for glucose, pyruvate, ketone bodies ({beta}-hydroxybutyrate, acetoacetate, acetate) and lactate. ConclusionsIn this large-scale postprandial metabolomic study, circulating metabolites were highly variable between individuals following a mixed challenge meal. Findings suggest that a meal challenge may yield postprandial responses divergent from fasting measures, specifically for glycolysis, essential amino acid, ketone body and lipoprotein size metabolites.

10

LC-MS system for automatically collecting time-resolved metabolomics data of cultured cells

Chan, C. C. Y.; Groves, R. A.; Lewis, I. A.

2024-10-12 microbiology 10.1101/2024.10.11.617934 medRxiv

Top 0.1%

28.8%

Show abstract

Temporal metabolic dynamics are a critical, but difficult to study aspect of metabolism. To address this, we developed a liquid chromatography-mass spectrometry (LC-MS) system, temporal uptake and nutritional analysis (TUNA), to automatically collect time-resolved metabolomics data of cultured cells. TUNA enables sub-minute sequential sampling, has broad metabolite coverage, supports robust metabolite identification, can monitor over 72 conditions in parallel, and can be implemented in most LC-MS laboratories. We used TUNA to monitor temporal dynamics of uropathogens (Escherichia coli and Proteus mirabilis) and identify novel metabolic phenotypes that cannot be captured from a single time point.

11

Comprehensive metabolite ratio QTL mapping reveals disease relevant enzyme biology

Rizi, S.; Goss, N.; Kutalik, Z.; van der Graaf, A.

2025-12-05 epidemiology 10.64898/2025.12.04.25341616 medRxiv

Top 0.1%

26.9%

Show abstract

Metabolite ratios are valuable proxies for enzyme- and pathway activity, which are otherwise hard to capture. The genetic basis of metabolite ratios has not been systematically studied as it implies expanding the search space quadratically, leading to increased computational burden. Here, we present efficient statistical methodology to identify ratio quantitative trait loci (rQTLs), requiring only association summary statistics of metabolite measurements. In validations the methodology shows strong correlation with classically estimated ratios (median R2 = 0.94). Across all pairwise metabolite comparisons, 5,095 metabolite pairs contain one or more significant rQTL that exhibit far stronger associations than their constituents. The genes to which these rQTLs map are strongly enriched for enzymes (Odds ratio ranges between 4.3 and 20, depending on gene mapping strategy). Furthermore, metabolites whose ratios have rQTLs have shorter reaction distance (median=4) compared to random pairs of metabolites (median=6) (P = 8.0{middle dot}10-13). We identified many otherwise missed loci: 1,249 rQTLs across 53 independent loci were novel, meaning that the individual metabolites did not pass significance in the source study, highlighting that our methodology increases the number of QTLs by 21%. rQTLs often reveal enzyme activity: capturing long-chain polyunsaturated fatty acid desaturation at the FADS2 locus (177 metabolites across 1,258 rQTLs), as well as elongation through ELOVL2 and ELOVL5 (23 metabolites across 27 rQTLs). Importantly, 72% of genes mapped to rQTLs were not available in pQTL studies. Furthermore, tissue-specific eQTLs confirmed that some blood rQTL associations (e.g. ones mapped to ETFDH in muscle tissue and SCD in adipose tissue) can capture processes taking place in other tissues. We further identified metabolite ratios that are likely causal biomarkers for malignant bladder neoplasms and ischaemic heart disease. Finally, we identified a novel rQTL for the cAMP-to-PFOS ratio, mapping to a mis-sense variant in ABCG2, suggesting that ABCG2 is involved in the excretion of PFOS in humans. In summary, our method is able to systematically map rQTLs which can serve as key disease biomarkers, proxy for unmeasured proteins and identify novel biology. We offer an interactive browser to explore the rQTL and metabolite ratios identified in this study: metabolite-ratio-app.athirtyone.com/

12

Phenyllactic Acid is Physiologically Released from Skeletal Muscle and Contributes to the Beneficial Effects of Physical Exercise in Humans

Hoene, M.; Zhao, X.; Hu, C.; Birkenfeld, A. L.; Peter, A.; Niess, A.; Moller, A.; Li, Q.; Lehmann, R.; Plomgaard, P.; Xu, G.; Weigert, C.

2024-03-30 endocrinology 10.1101/2024.03.29.24305064 medRxiv

Top 0.1%

26.3%

Show abstract

Aims/hypothesisWhile physical activity is clearly beneficial in combating type 2 diabetes, the underlying molecular mechanisms are incompletely understood. Moreover, there is a considerable degree of variability in the individual response to exercise-based lifestyle interventions that remains to be explained. We aimed to identify novel exercise-induced metabolites that could mediate the improvement in glycemic control and reduction of obesity and contribute to individual differences in the response to exercise interventions. MethodsWe studied acute exercise- and training-induced changes in plasma metabolites in sedentary subjects with overweight (8 male, 14 female) participating in an eight-week supervised training program flanked by two acute endurance exercise sessions. Plasma metabolites were quantified using LC- and CE-MS. In a separate study (n=9 lean males), we assessed metabolite fluxes over the leg using arterial and venous catheters. Functional analyses were performed in primary blood mononuclear cells (PBMCs) stimulated with lipopolysaccharide (LPS) or the saturated fatty acid palmitate. ResultsThe amino acid breakdown products 3-phenyllactic acid (PLA), 4-hydroxyphenyllactic acid and indolelactic acid were increased after both acute exercise and training. All three aromatic lactic acids, which so far mainly received attention as bacterial metabolites, exhibited an efflux from the leg. PLA showed the largest increase after both acute exercise and training, of 57% and 20% respectively. The magnitude of the acute exercise-induced increase in PLA correlated with a decrease in subcutaneous adipose tissue volume and an improvement in insulin sensitivity over the course of the intervention. Furthermore, both isomers, D- and L-PLA, counteracted inflammatory cytokine production in PBMCs. Conclusions/interpretationOur findings indicate that PLA is physiologically released from skeletal muscle and can contribute to the anti-inflammatory effects of exercise as well as to individual difference in the response to lifestyle interventions in humans. PLA and potentially, aromatic lactic acids in general may be particularly relevant metabolic regulators because they can be produced both endogenously and by the microbiome. Trial registrationClinicalTrials.gov NCT03151590

13

A new workflow combining R packages for statistical analysis of metabolites

Ferrario, P. G.

2019-11-28 bioinformatics 10.1101/848812 medRxiv

Top 0.1%

26.1%

Show abstract

In metabolomics, the investigation of an association between many metabolites and one trait (such as age in humans or cultivar in foods) is a central research question. On this topic, we present a complete statistical analysis, combining selected R packages in a new workflow, which we are sharing completely, according to modern standards and research reproducibility requirements. We demonstrate the workflow using a large-scale study with public data, available on repositories. Hence, the workflow can directly be re-used on quite different metabolomics data, when searching for association with one covariate of interest.

14

Cross-platform metabolomics imputation using importance-weighted autoencoders

Smith, A.; Pinto, R.; Zagkos, L.; Tzoulaki, I.; Elliott, P.; Dehghan, A.

2025-03-06 epidemiology 10.1101/2025.03.06.25323475 medRxiv

Top 0.1%

25.9%

Show abstract

BackgroundMetabolomics data are often generated through different analytical platforms and different methods of identification and quantification which makes their synthesis and large-scale replication challenging. To address this, we applied generative deep learning to impute metabolites assayed by Metabolon, a commonly used commercial platform, using metabolomic features acquired by an untargeted liquid chromatography-mass spectrometry (LC-MS) platform. MethodsWe utilised a subset of 979 samples from the Airwave Health Monitoring Study which were assayed by both Metabolon and National Phenome Centre at Imperial College (NPC) LC-MS assays to develop an ensemble of importance-weighted autoencoders (IWAEs) which can perform cross-platform metabolomics imputation between the two assays. Using the ensemble, we generated a Metabolon equivalent dataset in 2,971 additional Airwave samples that lacked prior Metabolon measurements. We conducted observational associations with two clinical outcomes, body mass index (BMI) and C-reactive protein (CRP). We validated the ensemble and imputed data by investigating the concordance of the observational associations. This was done using both the imputed Metabolon dataset and the measured metabolite levels by Metabolon, and NPC in the Airwave study and Nightingale platform in the UK Biobank. ResultsOur imputation ensemble generated samples highly correlated with their real values across all Metabolon metabolites within a held-out test set with a mean sample correlation of 0.61 (IQR 0.55-0.67). The well-imputed subset included 199 (22%) of the metabolites present in the real Metabolon dataset where the imputed values accounted for at least 55% of the original variance (R2 [≥] 0.55) and a minimal uncertainty (R2 variance [≤] 0.025). The subset included 43 metabolites not previously identified within our LC-MS platform. When comparing the associations of the real and imputed Metabolon metabolites with BMI and CRP, the standardised beta-coefficients were highly correlated ({rho} = 0.93 for BMI and 0.89 for CRP) with minimal mean difference (0.005 (0.04) for BMI, 0.005 (0.04) for CRP). Similar concordance occurred between the imputed Metabolon metabolites and equivalent UK Biobank (mean difference -0.007 (0.05) for BMI, 0.01 (0.04) for CRP) and our LC-MS platform (mean difference -0.013 (0.04) for BMI, -0.019 (0.04) for CRP). ConclusionThis methodological innovation offers a scalable and accurate method for cross-platform imputation which could allow for to aggregate individual-level metabolomics data from different epidemiological studies, replication findings or conduct meta-analyses.

15

Gene expression is a poor predictor of the metabolite abundance in cancer cells

Li, H.; Barbour, J. A.; Zhu, X.; Wong, J. W. H.

2021-12-20 bioinformatics 10.1101/2021.12.19.473333 medRxiv

Top 0.1%

23.7%

Show abstract

Metabolic reprogramming is a hallmark of cancer characterized by global changes in metabolite levels. However, compared with the study of gene expression, profiling of metabolites in cancer samples remains relatively understudied. We obtained metabolomic profiling and gene expression data from 454 human solid cancer cell lines across 24 cancer types from the Cancer Cell Line Encyclopedia (CCLE) database, to evaluate the feasibility of inferring metabolite levels from gene expression data. For each metabolite, we trained multivariable LASSO regression models to identify gene sets that are most predictive of the level of each metabolite profiled. Even when accounting for cell culture conditions or cell lineage in the model, few metabolites could be accurately predicted. In some cases, the inclusion of the upstream and downstream metabolites improved prediction accuracy, suggesting that gene expression is a poor predictor of steady-state metabolite levels. Our analysis uncovered a single robust relationship between the expression of nicotinamide N-methyltransferase (NNMT) and 1-methylnicotinamide (MNA), however, this relationship could only be validated in cancer samples with high purity, as NNMT is not expressed in immune cells. Together, our findings reveal the challenge of inferring metabolite levels from metabolic enzyme levels and suggest that direct metabolomic profiling is necessary to study metabolism in cancer.

16

The CcpNmr Analysis Simulated Metabolomics Database (CASMDB): An Open-Source Collection of Metabolite Annotation Data for 1D 1H NMR-Based Metabolomics

Hayward, M. W.; Mureddu, L. G.; Thompson, G.; Phelan, M.; Brooksbank, E. J.; Vuister, G. W.

2024-05-05 biochemistry 10.1101/2024.05.05.592402 medRxiv

Top 0.1%

23.1%

Show abstract

Databases are invaluable for the identification of individual metabolites in untargeted metabolomics analyses, providing annotated pure metabolite references that allow for comparisons with experimentally collected mixture samples. Despite the value of an extensive reference database, publicly available databases for NMR-based metabolomics are often incomplete with respect to experimental conditions and derived NMR annotation parameters, such as peak positions. Hence, they are not designed for visualising the reference spectra alongside an experimental sample spectrum of interest, thus limiting the usefulness of the database. As a consequence, researchers have resorted to their own user- or application based database implementations. In this paper we describe the collection, remediation and integration of annotation data from the publicly available HMDB, BRMB and GISMO NMR metabolomics databases to build the CcpNmr Analysis Simulated Metabolomics Database (CASMDB) that contains 1932 unique metabolite entries. This database, in concert with the AnalysisMetabolomics programme, also allows or accurate simulation of spectra at arbitrary field strengths. Together, these tools underpin the visualising of experimental and simulated metabolite references and their usage in 1D 1H NMR-based metabolomics studies.

17

Comparison of extraction methods for intracellular metabolomics

Andresen, C.; Boch, T.; Gegner, H. M.; Mechtel, N.; Narr, A.; Birgin, E.; Rasbach, E.; Rahbari, N. N.; Trumpp, A.; Poschet, G.; Huebschmann, D.

2021-12-15 biochemistry 10.1101/2021.12.15.470649 medRxiv

Top 0.1%

22.9%

Show abstract

Measurements of metabolic compounds inside cells or tissues are of high informative potential since they represent the endpoint of biological information flow and a snapshot of the integration of many regulatory processes. However, it requires careful extraction to quantify their abundance. Here we present a comprehensive study using ten extraction protocols on four human sample types (liver tissue, bone marrow, HL60 and HEK cells) targeting 630 metabolites of different chemical classes. We show that the extraction efficiency and stability is highly variable across protocols and tissues by using different quality metrics including the limit of detection and variability between replicates as well as the sum of concentration as a global estimate of extraction stability. The profile of extracted metabolites depends on the used solvents - an observation which has implications for measurements of different sample types and metabolic compounds of interest. To identify the optimal extraction method for future metabolomics studies, the benchmark dataset was implemented in an easy-to-use, interactive and flexible online resource (R/shiny app MetaboExtract).

18

A Chemical Reaction Similarity-Based Prediction Algorithm Identifies the Multiple Taxa Required to Catalyze an Entire Metabolic Pathway of Dietary Flavonoids

Gulsan, E. E.; Nowshad, F.; Yamaguchi, P.; Dong, X.; Jayaraman, A.; Lee, K.

2023-05-05 microbiology 10.1101/2023.05.05.539480 medRxiv

Top 0.1%

22.7%

Show abstract

Flavonoids are polyphenolic phytochemicals abundant in plant-based, health-promoting foods. They are only partially absorbed in the small intestine, and gut microbiota plays a significant role in their metabolism. As flavonoids are not natural substrates of gut bacterial enzymes, reactions of flavonoid metabolism have been attributed to the ability of general classes of enzymes to metabolize non-natural substrates. To systematically characterize this promiscuous enzyme activity, we developed a prediction tool that is based on chemical reaction similarity. The tool takes a list of enzymes or organisms to match microbial enzymes with their non-native flavonoid substrates and orphan reactions. We successfully predicted the promiscuous activity of known flavonoid-metabolizing bacterial and plant enzymes. Next, we used this tool to identify the multiple taxa required to catalyze an entire metabolic pathway of dietary flavonoids. Tilianin is a flavonoid-O-glycoside having biological and pharmacological activities, including neuroprotection. Using our prediction tool, we defined a novel bacterial pathway of tilianin metabolism that includes O-deglycosylation to acacetin, demethylation of acacetin to apigenin, and hydrogenation of apigenin to naringenin. We predicted and confirmed using in vitro experiments and LC-MS techniques that Bifidobacterium longum subsp. animalis, Blautia coccoides and Flavonifractor plautii can catalyze this pathway. Prospectively, the prediction-validation methodology developed in this work could be used to systematically characterize gut microbial metabolism of dietary flavonoids and other phytochemicals. The bioactivities of flavonoids and their metabolic products can vary widely. We used an in vitro rat neuronal model to show that tilianin metabolites exhibit protective effect against H2O2 through reactive oxygen species (Delepine et al.) scavenging activity and thus, improve cell viability, while the parent compound, tilianin, was ineffective. These results are important to understand the gut microbiota-dependent physiological effects of dietary flavonoids.

19

Metabolite discovery through global annotation of untargeted metabolomics data

Chen, L.; Lu, W.; Wang, L.; Xing, X.; Teng, X.; Zeng, X.; Muscarella, A. D.; Shen, Y.; Cowan, A. J.; McReynolds, M. R.; Kennedy, B.; Lato, A. M.; Campagna, S. R.; Singh, M.; Rabinowitz, J. D.

2021-01-06 bioinformatics 10.1101/2021.01.06.425569 medRxiv

Top 0.1%

22.5%

Show abstract

Liquid chromatography-high resolution mass spectrometry (LC-MS)-based metabolomics aims to identify and quantitate all metabolites, but most LC-MS peaks remain unidentified. Here, we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. The approach aims to generate, for all experimentally observed ion peaks, annotations that match the measured masses, retention times, and (when available) MS/MS fragmentation patterns. Peaks are connected based on mass differences reflecting adducting, fragmentation, isotopes, or feasible biochemical transformations. Global optimization generates a single network linking most observed ion peaks, enhances peak assignment accuracy, and produces chemically-informative peak-peak relationships, including for peaks lacking MS/MS spectra. Applying this approach to yeast and mouse data, we identified five novel metabolites (thiamine derivatives and N-glucosyl-taurine). Isotope tracer studies indicate active flux through these metabolites. Thus, NetID applies existing metabolomic knowledge and global optimization to annotate untargeted metabolomics data, revealing novel metabolites.

20

Analyzing postprandial metabolomics data using multiway models: A simulation study

Li, L.; Yan, S.; Bakker, B. M.; Hoefsloot, H.; Chawes, B.; Horner, D.; Rasmussen, M. A.; Smilde, A. K.; Acar, E.

2022-12-20 systems biology 10.1101/2022.12.19.521154 medRxiv

Top 0.1%

19.8%

Show abstract

BackgroundAnalysis of time-resolved postprandial metabolomics data can improve the understanding of metabolic mechanisms, potentially revealing biomarkers for early diagnosis of metabolic diseases and advancing precision nutrition and medicine. Postprandial metabolomics measurements at several time points from multiple subjects can be arranged as a subjects by metabolites by time points array. Traditional analysis methods are limited in terms of revealing subject groups, related metabolites, and temporal patterns simultaneously from such three-way data. ResultsWe introduce an unsupervised multiway analysis approach based on the CANDECOMP/PARAFAC (CP) model for improved analysis of postpran-dial metabolomics data guided by a simulation study. Because of the lack of ground truth in real data, we generate simulated data using a comprehensive human metabolic model. This allows us to assess the performance of CP models in terms of revealing subject groups and underlying metabolic processes. We study three analysis approaches: analysis of fasting-state data using Principal Component Analysis, T0-corrected data (i.e., data corrected by subtracting fasting-state data) using a CP model and full-dynamic (i.e., full postprandial) data using CP. Through extensive simulations, we demonstrate that CP models capture meaningful and stable patterns from simulated meal challenge data, revealing underlying mechanisms and differences between diseased vs. healthy groups. ConclusionsOur experiments show that it is crucial to analyze both fasting-state and T0-corrected data for understanding metabolic differences among subject groups. Depending on the nature of the subject group structure, the best group separation may be achieved by CP models of T0-corrected or full-dynamic data. This study introduces an improved analysis approach for postprandial metabolomics data while also shedding light on the debate about correcting baseline values in longitudinal data analysis.