Metabolites
○ MDPI AG
Preprints posted in the last 90 days, ranked by how well they match Metabolites's content profile, based on 50 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.
Herrera, L.; Meneses, M. J.; Ribeiro, R. T.; Gardete-Correia, L.; Raposo, J. F.; Boavida, J. M.; Penha-Goncalves, C.; Macedo, M. P.
Show abstract
Background & AimsMetabolic disorders such as dyslipidemia, metabolic dysfunction-associated steatotic liver disease (MASLD), and diabetes are promoted by chronic pro-inflammatory and pro-oxidative states. Paraoxonase 1 (PON1), a liver-derived HDL-associated enzyme, plays an important antioxidant role by hydrolyzing oxidized lipids and protecting against oxidative stress- induced damage. Genetic variation in PON1, particularly in promoter and coding regions, modulates enzyme expression and activity, thereby influencing susceptibility to metabolic and cardiovascular diseases. This study investigated the genetic determinants of serum paraoxonase (PONase) activity and their relationship with dysmetabolic phenotypes. MethodsA genome-wide association study was conducted in 922 Portuguese individuals from the PREVADIAB2 cohort. Genetic variants and haplotypes related to PONase activity were analyzed, and associations with dysglycemia and liver fibrosis were evaluated in individuals aged over 55 years. ResultsWe identified two key PON1 variants as determinants of PONase activity: rs2057681 (in strong linkage disequilibrium with the non-synonymous Q192R variant) and rs854572 (located in the promoter region). Analysis of rs854572-rs2057681 haplotypes revealed that specific combinations differentially modulate PONase activity and confer risk or protection for dysglycemia and liver fibrosis, depending on the rs2057681 genotype context. Notably, although PONase activity was strongly associated with PON1 variants, it did not directly correlate with dysmetabolic phenotypes, suggesting that genetic context and haplotype structure, rather than enzyme activity alone, shape disease susceptibility. ConclusionsThese findings highlight the complex genetic architecture of PON1 and its role in metabolic disease risk, supporting the use of PON1 genetic information to uncover predisposition to dysmetabolic conditions. Our results provide insights into the interplay between PON1 genetics, enzyme function, and dysmetabolism, with implications for risk stratification in metabolic liver disease. Lay SummaryPON1 is a liver-derived gene that encodes an enzyme involved in protection against oxidative stress, a key contributor to metabolic liver disease and diabetes. In this study, we found that specific combinations of PON1 genetic variants are associated with abnormalities in blood glucose regulation and with markers of liver fibrosis. These associations were dependent on genetic configuration rather than enzyme activity alone, suggesting that PON1 genetic information may help identify individuals at higher risk of metabolic liver disease.
Lin, H.; Zhang, L.; Lotfi, A.; Jarmusch, A.; Lee, I.; Kim, A.; Morton, J.; Aksenov, A. A.
Show abstract
This protocol describes a computational approach for constructing correlation-based molecular networks from untargeted metabolomics data using MetVAE, a variational autoencoder-based framework. Complementing spectral similarity networks, it captures functional relationships re-flected in cross-sample correlations. The workflow imports metabolomics features and sample metadata, adjusts for compositionality, missingness, confounding, and high-dimensionality, esti-mates sparse metabolite correlations, and exports GraphML files for network visualization. In a hepatocellular carcinoma mouse model, it links lipid classes in high-fat-diet animals, suggesting an endogenous "auto-brewery" route to lipotoxic metabolites.
Taylor, A. L.; Snyder, N. W.; Bartman, C. R.
Show abstract
Coenzyme A is an essential cofactor synthesized from pantothenate, cysteine, and ATP, and is involved in numerous processes of cellular metabolism through its ability to carry activated acyl groups. Coenzyme A participates in catabolism of carbohydrate, fat and amino acids; biosynthesis of fatty acids, cholesterol and heme; and protein modification including acetylation and 4-phosphopantetheinylation. Despite CoAs critical functions, the regulation of CoA levels and the rate of CoA synthesis in different cell types and disease states are not well understood. One reason for this gap is that many acyl-CoA species are analytically challenging to measure due to factors including instability, poor ionization, and the wide range of biochemical properties conferred by different acyl chain lengths. In addition, most current methods do not support analysis of CoA isotopic labeling, which is required to quantify CoA synthesis rate or to measure absolute concentration using isotope-labeled internal standards. Here, we describe a method to quantify the concentration and isotopic labeling of total CoA, defined as the sum of CoASH plus all acyl-CoA species. Acyl-CoA species are hydrolyzed using sodium hydroxide to remove acyl chains, then CoA is derivatized on the thiol with N-ethylmaleimide (NEM). Following protein precipitation and solid phase extraction, samples are analyzed by liquid chromatography-mass spectrometry. This method is linear in a wide range that captures mouse tissue CoA levels, with accuracy within 15% error and precision below 15% relative standard deviation for both pure standards and tissue samples. We applied this method to measure total CoA concentration in five tissues from male and female mice, and total CoA synthesis rate in mouse liver via infusion of 13C-15N-pantothenate. Overall, this method offers a tractable approach to measure total CoA concentration and isotopic labeling to enable study of total CoA synthesis rates and concentrations in health and disease.
O'Loughlin, J.; Moses, T.
Show abstract
Metabolomics offers a sophisticated analytical framework for characterising the molecular phenotype of biological organisms and complex living systems at a high resolution. As the functional endpoint of the omics cascade, the metabolome serves as a close reflection of cellular activity. It integrates genetic, transcriptomic and proteomic variations with external environmental influences. However, the inherent complexity of metabolomic datasets, characterised by high-dimensional chemical diversity, wide dynamic ranges, and significant matrix effects, necessitates a rigorous suite of chemometric and bioinformatic workflows. For researchers uninitiated in computational biology, the multi-stage requirement for raw data pre-processing, signal deconvolution, and multivariate statistical modelling (such as PCA or PLS-DA) presents a substantial barrier to entry. Navigating these convoluted data architectures remains a primary challenge in deriving biological meaning from the global metabolic profile. Here, we present a workflow to use Python Dash Apps to create a user-friendly interface for simplifying data processing and statistical calculations. Users can select their desired samples to initiate calculations for various statistical tests, generating interactive and publication-quality figures to explore their results. These apps were deployed on an Apache server via cPanel, allowing individuals to share their findings with collaborators and for research facilities to share metabolomics results with their users.
Hauguel, P.; Anctil, N.; Noel, L.-P.
Show abstract
Background. Plasma and serum metabolomic studies of myalgic encephalomyelitis / chronic fatigue syndrome (ME/CFS) have repeatedly implicated hypometabolic, lipid, mitochondrial, redox and tryptophan-kynurenine pathways, but prior cohorts have been modest in size and have used heterogeneous case definitions. Whether similar pathway-level signals are detectable at scale in dried blood spots (DBS), across questionnaire-derived fatigue constructs and across orthogonal LC gradients in the same individuals remains unresolved. Methods. We profiled DBS extracts from 1,784 community-cohort adults by reverse-phase LC-MS using paired 5 min and 15 min gradients. Six questionnaire-derived endpoints captured a pragmatic self-reported PEM-like phenotype, a DSQ-derived PEM-like construct, high or review clinical status, temporal fatigue state, comorbid fatigue and self-reported chronic fatigue. The locked primary endpoint for Phase 1 was pragmatic_fatigue_pem with 226 cases and 914 controls after excluding major metabolic comorbidity. We tested a biology-first panel comprising 22 literature-curated metabolites represented by four participant-level descriptors each, and evaluated three discovery extensions: a targeted m/z search of additional literature candidates, a hypothesis-free univariate screen across 4,553 5 min and 5,625 15 min consensus features, and pairwise z-difference ratios. Endpoint-specific Ridge classifiers were evaluated by five-fold out-of-fold AUC with bootstrap stability filtering. Cross-gradient agreement was assessed by per-metabolite AUC concordance between paired 5 min and 15 min profiles. Severity was modelled as an ordinal grade derived from the number of fatigue criteria met and chronic-fatigue-form status. Results. The biology-first DBS panel achieved out-of-fold AUC 0.81 for the pragmatic self-reported PEM-like endpoint (226 cases / 914 controls). The DSQ-derived PEM-like construct reached AUC 0.60 (57 cases / 201 controls) on the un-filtered set and AUC 0.778 (SD 0.013, twenty seeds) in a post-hoc signature-decomposition follow-up restricted to participants without a self-declared major-metabolic-history tag (29 cases / 230 controls); both are treated as construct-validity anchors rather than as provoked or clinically adjudicated PEM. An optimised operationalisation of the same construct (panel-self normalisation, restriction to non-comorbid participants and demographic covariates) reached AUC 0.71 (95 % CI 0.55 to 0.76), and an exploratory age-stratified signature decomposition suggested age-dependent pathway composition that requires confirmation given small per-stratum case counts. Stable contributors mapped to carnitine-shuttle, TCA-cycle, redox-thiol and tryptophan-kynurenine pathways. Cross-gradient analysis of 22 matched metabolites yielded Pearson r = 0.62 for signed univariate effects (p = 0.002; 68 % directional agreement). The metabolomic score increased with severity grade (Spearman rho = 0.45, p = 4 x 10^-91; median scores 0.24, 0.51 and 0.75 across grades 0, 1 and 2). Sensitivity analyses on the covariate-complete subset (n = 565; 138 cases / 427 controls) showed that the DBS signal was robust to adjustment for age, sex, BMI and medication burden (DBS-only AUC 0.76, DBS plus covariates 0.78, covariates only 0.64), and produced a metabolomic-specific lift of approximately 0.13 AUC over the strongest anti-leak declarative cross-form questionnaire baseline (AUC 0.63). DBS-only AUC was stable across sex, age and BMI subgroups, and a 1:4 nearest-neighbour matched analysis on age, sex and BMI yielded AUC 0.72 (95 % CI 0.67 to 0.77). The observed pattern supported pathway-level convergence with prior ME/CFS metabolomics literature, including carnitine shuttle, fatty-acid beta-oxidation, TCA cycle, redox-thiol, urea cycle, glycerophospholipid and tryptophan-kynurenine axes. In contrast, the hypothesis-free 15 min screen produced high-AUC features that mapped predominantly to environmental or technical signals, including pesticide, industrial-amine and mobile-phase artifact annotations; only one of eight top leads, a truncated oxidised phospholipid, was biologically plausible, and none had tandem-MS support. Conclusions. In this large community cohort, a literature-curated DBS metabolomic panel captured pathway-level biology associated with a questionnaire-derived PEM-like fatigue phenotype, showed directional concordance across LC gradients, scaled with symptom severity and remained robust to key demographic, anthropometric and anti-leak questionnaire baselines. The findings converge with several metabolic axes previously reported in ME/CFS plasma and serum studies, including carnitine-shuttle, TCA-cycle, redox-thiol, urea-cycle, glycerophospholipid and tryptophan-kynurenine pathways. They should not be interpreted as clinical validation of a diagnostic test, screening tool or objective provoked-PEM biomarker. Rather, they support at-home-compatible DBS metabolomics as a biologically grounded platform for future clinically adjudicated validation, decision-support development and longitudinal monitoring in fatigue and PEM-like syndromes. Because DBS contains cellular and plasma-derived components, matrix effects must be considered when comparing individual metabolites with venous plasma or serum studies, and hypothesis-free screening at this scale can preferentially surface exposome or technical variance unless molecular identification is enforced before biological interpretation.
Berna, A. Z.; Panganiban, J.; Liu, Y.; Logan, J.; Russo, P.; Aryal, A.; Hafertepe, K.; Abu-Alreesh, S.; DeBosch, B.; Stoll, J.; John, A. R. O.
Show abstract
Background & Aims: Metabolic Dysfunction Associated Steatotic Liver Disease (MASLD) is the leading cause of chronic liver disease in children. However, accurate, noninvasive diagnostic tools remain limited. Current screening methods are invasive or lack sensitivity. Breath-based volatile organic compound (VOC) analysis offers a simple approach with potential for point of care screening. This study aimed to identify and validate breath VOC signatures of pediatric MASLD. Approach & Results: We conducted a prospective IRB approved cohort study at the Childrens Hospital of Philadelphia (CHOP). Children aged between 7 and 20 years with MASLD (n=22), as defined by hepatic steatosis either by liver biopsy or imaging and 1 cardiometabolic risk factor, and a control group without MASLD (n=20) were enrolled. Breath samples were collected using a standardized protocol and analyzed by untargeted comprehensive two-dimensional gas chromatography-mass spectrometry (GCGCMS). Machine learning and unsupervised clustering were applied to identify discriminatory VOCs and assess heterogeneity. Untargeted GCGCMS analysis identified a distinct breath VOC signature in children with MASLD compared with non MASLD controls. A Random Forest model achieved a sensitivity of 73% and specificity of 65%, with AUC of 0.84. The VOC 2,4-dimethyl-1-heptene demonstrated strong diagnostic performance in the discovery cohort with a sensitivity of 85%, specificity of 77% and an AUC of 0.81. Unsupervised clustering revealed four MASLD subgroups with distinct volatile phenotypes associated with differences in liver enzymes and metabolic parameters. External validation in a second pediatric cohort confirmed reproducible reductions in o/p-xylene in subjects with MASLD. Conclusions: Pediatric MASLD is associated with a reproducible breath VOC signature identified by untargeted GCGCMS. These findings support breath analysis as a scalable, noninvasive screening and stratification tool for pediatric MASLD and warrant validation in larger, longitudinal studies.
Anctil, N.; Hauguel, P.; Noel, L.-P.
Show abstract
BackgroundBreast cancer (BC) remains the most diagnosed malignancy and leading cancer-related cause of mortality in women worldwide. Although blood-based untargeted metabolomics has emerged as a promising modality for detecting early-stage BC, the clinical translation of this approach has been bottlenecked by two unresolved issues: (i) the field has almost exclusively relied on serum or plasma, which require venipuncture and cold-chain logistics, and (ii) machine-learning models reported on such data are frequently validated with protocols that are blind to analytical batch structure, producing optimistically biased performance estimates. MethodsWe present a breast cancer detection study based on dried blood spots (DBS), an analytical matrix that enables self-collection and ambient-temperature shipping. A cohort of 2,734 participants (114 biopsy-confirmed BC cases; 2,620 non-cancer controls) was profiled by untargeted LC-MS/MS on a Thermo Scientific Orbitrap IQ-X coupled to a Vanquish UHPLC. A 39-metabolite panel meeting MSI Level 1 identification criteria [1] was pre-specified a priori from the published breast-cancer metabolomics literature, frozen prior to LC-MS acquisition, and applied to the present cohort without any feature selection on the data. Six standard supervised-learning architectures (LASSO, Elastic Net, Linear SVM, PLS-DA, OPLS-DA, XGBoost) were evaluated on this pre-specified panel; OPLS-DA, whose pyopls implementation does not integrate cleanly into the repeated multi-seed batch-aware protocol, is reported only in the sex-matched subgroup analysis where a single-seed 5-fold stratified protocol permits a directly comparable fit. Per-batch control-median normalization is applied upstream, following the protocol of the companion same-lab study [2], which removes batch-specific intensity shifts at the data-preparation stage; kNN imputation, log transform, and robust scaling are then fit within each training fold. The evaluation battery comprises batch-aware StratifiedGroupKFold CV reported at single-seed (seed=42) with inter-seed SD quantified across 10 independent seeds, batch-aware nested CV, a 100-seed held-out 20%-batch validation with disjoint-batch isotonic probability calibration (30% calibration partition), PPV/NPV reporting at multiple operating points and three deployment prevalences, subgroup analyses by TNM stage and tumor grade, pathway-ablation sensitivity analysis, and a 1,000-iteration permutation test. ResultsUnder batch-aware evaluation (StratifiedGroupKFold, single-seed=42), AUC ranged from 0.914 to 0.949 across classifiers, with LASSO achieving 0.928 and XGBoost 0.949; inter-seed SD across 10 seeds was 0.002-0.006. At 95% specificity, LASSO reached 75.4% sensitivity and XGBoost 81.6%. Held-out batch validation (100 seeds) yielded mean AUC 0.912 for Elastic Net and 0.935 for XGBoost, confirming robust generalization. All 39 panel features showed high coefficient stability, and permutation testing on representative classifiers (LASSO, Linear SVM, PLS-DA) yielded p [≤] 0.001. Subgroup analyses showed weaker detection of stage IIA tumors (AUC 0.87, n=40) compared with stage IIB/IIIA (AUC 0.95), consistent with stronger metabolic signatures in more advanced disease. Bootstrap coefficient consistency of the Elastic Net classifier confirmed that all 39 panel features received a non-zero multivariate weight in >=80% of 100 stratified bootstraps. Permutation testing on the three representative classifiers subjected to this analysis (LASSO, Linear SVM, PLS-DA) confirmed significance at p [≤] 0.001 in all three cases. ConclusionsOn this cohort of diagnosed, pre-treatment breast-cancer cases, DBS LC-MS metabolomic profiling delivers classification performance (AUC 0.928 for LASSO and 0.949 for XGBoost under batch-aware GroupKFold CV at single-seed=42; held-out AUC 0.912-0.935) that is robust across classifier families and biological pathways. The DBS matrix is non-radiating, self-collectable by finger-prick, and mailable at ambient temperature. The approach complements the established venous-blood workflow while addressing a clear infrastructural gap identified over nearly a decade of preliminary work [3, 4]. Performance is weaker on stage IIA than on more advanced disease, and prospective validation in an independent asymptomatic screening cohort is required before clinical positioning as a decentralized triage modality.
Karaman, I.; Payne, T.; Vizcaino, J. A.
Show abstract
Public data reuse is a key driver of progress in omics sciences, including increasingly metabolomics data. In this study, we present a validated analysis of confirmed reuse of datasets from the MetaboLights data repository, one of the leading resources in the field. Candidate publications were collected via dataset identifiers (MTBLS#) using a Python-based retrieval pipeline across major publisher databases. They were next manually validated to distinguish active reuse from citation-only mentions. Overall, 272 unique publications were confirmed to have reused at least one MetaboLights dataset. Reuse is dominated by Method/Tool Development, with smaller contributions from Secondary Biological Analysis and Data Integration/Meta-analysis. LC-MS datasets account for the majority of reuse, whereas NMR and GC-MS also contribute but at a lower level. Data reuse has increased over time, with a noticeable acceleration in the most recent years. At the dataset level, reuse follows a long-tail distribution, where a small subset of datasets accounts for repeated reuse, mainly as community benchmarks. These results provide a conservative estimate of public metabolomics data reuse and show that public datasets are predominantly used for methodological and computational applications. They also indicate that reuse is under-detected when dataset identifiers are not consistently reported, highlighting the need for standardised dataset citation to improve traceability and recognition of reuse. Statement of significance of the studyThe impact of public metabolomics repositories has been difficult to assess due to the lack of reliable evidence distinguishing true data reuse from simple literature citations. This study addresses that gap by providing a conservative, manually validated baseline for confirmed reuse of datasets from the MetaboLights data repository. The analysis clarifies how MetaboLights datasets are used in practice, showing that reuse is concentrated to a limited number of datasets and is dominated by computational and methodological applications.
Cross, E.; Westcott, F.; Smith, K.; Nagarajan, S. R.; Sanna, F.; Dennis, K. M.; Hodson, L.
Show abstract
BackgroundMetabolic dysfunction-associated steatotic liver disease (MASLD) is challenging to study in vivo in humans and in vitro models are limited. Although primary human hepatocytes (PHHs) are considered the gold-standard, immortalized hepatic cell lines are utilised due to scalability. This study compared the metabolic responses of PHHs with our Huh7-based model cultured in physiologically-relevant fatty acid (FA) mixtures. MethodsPHH and Huh7 cells were treated with 2% human serum, sugars and FAs enriched in either unsaturated (OPLA) or saturated (POLA) FAs for 4 or 7 days, respectively. Stable isotope tracers investigated basal metabolic changes in response to treatment. Cell viability, media biochemistry, intracellular metabolism, lipid droplet morphology and gene expression were quantified. ResultsHuh7 cells had greater viability than PHHs, while NEFA uptake and triglyceride secretion were similar. OPLA and POLA increased large lipid droplets in Huh7 cells, whereas only OPLA produced comparable effects in PHHs. Despite higher baseline TG in PHHs, both models showed similar lipid composition, de novo lipogenic responses, and glycogen levels. Compared to Huh7 cells, PHHs exhibited higher 3-hydroxybutyrate, lower lactate, reduced glucose uptake, and donor-dependent transcriptomic variability. ConclusionsHuh7 cells are metabolically adaptable and when cultured in physiologically-relevant media, produce metabolic readouts similar PHH cells.
Tsiara, I.; Vouzaxaki, E.; Ekström, J.; Rameika, N.; Yang, F.; Jain, A.; Iglesias Alonso, A.; Sjöblom, T.; Globisch, D.
Show abstract
Cancer-related casualties are the most common cause of death worldwide. The discovery of biomarkers is of utmost importance for diagnosis and disease monitoring. Herein, we performed a comprehensive metabolomics biomarker discovery effort in plasma from 615 lung, ovarian and colorectal cancer patients at diagnosis and 95 non-cancerous control subjects. This pan-cancer investigation identified specific panels of metabolites in the entire sample cohort with a high discriminating power and demonstrated by combined ROC AUC values of up to 0.95. The identified metabolites are mainly associated with lipid and amino acid metabolism as well as xenobiotic transformation. These metabolite panels of high predictive power provide new metabolic insights in these cancers and demonstrate the potential of metabolomics for improved diagnosis and monitoring disease progression.
Arp, N. L.; Deng, F.; Lika, J.; Seim, G. L.; Falco Cobra, P.; Mellado Fritz, C.; John, S. V.; Rathinaraj, S.; Shields, B. E.; Amador-Noguez, D.; Henzler-Wildman, K.; Fan, J.
Show abstract
Identifying metabolites and metabolic reactions specific to a cellular state, such as inflammatory state in immune cells, is of great interest, as it can provide important biomarkers and point to compounds and reactions of specific biological functions. However, many cell state-specific metabolites remain in the unannotated part of metabolome. Here we identified a series of sulfur-containing metabolites that are actively produced in macrophages upon classical activation, but not in resting state or alternative activation state. Isotopic tracing, in vitro assays and genetic perturbations further revealed that they are formed from reactions between free cysteine and several important intermediates in glycolysis and TCA cycle. Upon classical activation, macrophages specifically upregulate the import of cystine via Slc7a11, supporting the production of these adducts. Their production dynamically responds to changes in central metabolism, environmental nutrient levels, and is regulated by nitric oxide. Finally, we confirmed these newly identified compounds also present in human samples, and most of them are significantly elevated in inflammatory granuloma annulare lesions. This work elucidated a previously uncharted part of metabolic network that is associated with inflammation and metabolic stress condition, which has important implications and set foundation for many future discoveries.
Rajkumar, P.; Gadiya, Y.; Deleray, V.; Roux, A.; West, K. A.; Allen, A.; Dorrestein, P.; Domingo-Fernandez, D.; Misra, B. B.
Show abstract
Untargeted liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based metabolomics is an important technology for unbiased discovery of small molecules in biomedical (e.g., drug discovery to diagnostics), animal, plant, environmental, and microbial research. Over the past decade, ion mobility has added an additional dimension to the triplet of MS1, MS2, and retention time, helping resolve co-eluting or isomeric features in an LC-MS/MS that aid in compound identification. Here, we focused on evaluating the current trapped ion mobility spectrometry (TIMS)-amenable feature-finding tools (MZmine 4.9, MS-DIAL 5.5, and MetaboScape 2025 14.0.3) for pre-processing of metabolomics data generated using a popular ion mobility mass spectrometry (IM-MS) technique, TIMS. We leveraged ten public and three benchmark TIMS datasets to evaluate these tools for their strengths and weaknesses. Our results show that MZmine consistently identified the highest number of features and confidently annotated features; however, this performance was accompanied by an increased number of false positives, due to peak splitting, as well as reduced accuracy in collision cross section (CCS) measurements. In contrast, MetaboScape achieved the highest fraction of high-quality MS2 spectra, reflecting a more conservative feature detection strategy. MS-DIAL demonstrated balanced performance, identifying features that other tools missed. Finally, we publicly release the ground-truth datasets and code to support future developments in improving IMS data analysis.
Kurata, M.; Yamamoto, H.; Tsugawa, H.
Show abstract
Principal component analysis (PCA) is widely used in mass spectrometry-based metabolomics for exploratory data mining. Statistical testing of loading values can extract metabolite features associated with score patterns, but this approach requires principal components (PCs) to remain orthogonal while loadings are defined as correlation coefficients between PC scores and variables. Adjustment for Confounding PCA (AC-PCA) was previously developed to explore biologically meaningful components from data matrices affected by biological and technical confounders. However, AC-PCA does not simultaneously ensure PC orthogonality and a correlation-coefficient definition of loadings, limiting the statistical interpretation of its loadings. Here, we reformulated AC-PCA as Orthogonal Adjustment for Confounding effects in PCA (OAC-PCA). In OAC-PCA, PCs remain orthogonal, and loadings retain this correlation-coefficient interpretation. These properties enable statistical testing of metabolite associations while accounting for confounding effects.
Luning, Z.; Shuang, W.; Jixing, P.; Xiaofei, H.; Wenxue, W.; Dehai, L.
Show abstract
Spectral similarity is widely used as a proxy for structural similarity in tandem mass spectrometry (MS/MS) analyses, including library matching and molecular networking. However, the relationship between spectral similarity scores and true structural similarity remains imperfect, limiting compound identification in metabolomics studies. Here, we present BertMS, a spectral similarity framework based on bidirectional encoder representations from transformers (BERT), which learns contextualized representations of fragment ions from large-scale MS/MS data. Using datasets from MoNA and GNPS comprising over 100,000 unique molecules, we systematically evaluate BertMS against existing methods, including cosine similarity and Spec2Vec. BertMS shows improved performance across multiple evaluation metrics, with average gains of approximately 15-25% depending on the task. Notably, improvements are most evident in molecular similarity assessment. We further demonstrate the applicability of BertMS in molecular networking and dereplication of microbial metabolites, where it enables more consistent identification of structurally related compounds. Together, these results demonstrate that transformer-based representations improve spectral similarity estimation and enable more reliable metabolite annotation in complex mixtures.
von Itter, M.-N.; Grune, E.; Nonnenmacher, T.; Rach, S.; Flis, M.; Haueise, T.; Weiss, J.; Brenner, H.; Keil, T.; Roden, M.; Schulze, M. B.; Schulz-Menger, J. E.; Völzke, H.; Stefan, N.; Schlett, C. L.; Kauczor, H.-U.; Machann, J.; Bamberg, F.; Nattenmüller, J.; Norajitra, T.; Rospleszcz, S.
Show abstract
Background and Aims: Steatotic liver disease (SLD) has high clinical and public health relevance. Robust population estimates of SLD and its subcategories are challenging due to the limitations of ultrasound measurements or non-invasive scores, particularly for low-grade steatosis. We aimed to quantify SLD prevalence using magnetic resonance imaging (MRI) in the population-based German National Cohort (NAKO). Methods: Hepatic multi-echo Dixon MRI was performed at 5 dedicated study sites with identical setup across Germany. Liver fat (proton density fat fraction, PDFF), R2* as proxy for liver iron, and liver volume were assessed. The resulting data of N = 29'842 individuals (age range 20-72 years) were weighted by survey weights for regional representativeness, resulting in a sample of 50% women and a mean age of 45.6 years. SLD was defined as PDFF [≥] 5.75%, and sex-specific prevalence according to age, BMI, socioeconomic status and geographic region was calculated. Results: Overall, SLD prevalence was 21.3% in women and 35.7% in men, and the majority were metabolic dysfunction-associated (MASLD, 89.3% of all SLD cases). Prevalence increased with age in a sex-specific pattern, suggesting potential menopausal effects in women. There was a relevant prevalence of SLD in individuals with normal weight (5.3% in women, 13.2% in men) and the age group <25 years (7.5% in women, 11.9% in women). Differences in prevalence between low and high socioeconomic status were more pronounced in women (37% vs 15.8%) compared to men (45.5% vs 30.3%). Conclusions: Data underscore the high public health relevance of SLD and its subcategory MASLD. The considerable prevalence in groups historically considered low-risk, such as younger or lean individuals, emphasizes the need for raising awareness early.
Shilo, S.; Talmor-Barkan, Y.; Gorodetski, M.; Azouri, D.; Godneva, A.; Segal, E.; Rossman, H.
Show abstract
The transition from metabolic health to type 2 diabetes unfolds through progressive insulin resistance (IR), yet the gold-standard hyperinsulinemic-euglycemic clamp is inapplicable at population scale and fasting insulin is not uniformly available. Several surrogate measures have been described in the literature, but whether these surrogates identify the same individuals, and whether continuous glucose monitoring (CGM) or NMR metabolomics carry information beyond conventional markers, remains unresolved. Here, we analyzed IR surrogates in 10,114 non-diabetic adults (35-75 y) from the Human Phenotype Project (HPP), integrated with 14-day CGM, dual x-ray absorptiometry (DEXA) body composition, liver and carotid ultrasound, sleep monitoring, and NMR metabolomics and established sex-specific, age-resolved reference ranges. IR surrogates were moderately inter-correlated but captured distinct metabolic facets. We next focused on DEXA-derived visceral adipose tissue (VAT), one of the strongest correlates of clamp-measured insulin resistance. Our analysis showed that VAT can be reliably predicted from anthropometric measurements alone (R{superscript 2} = 0.659). However, it is only modestly predicted by CGM features alone (R2 = 0.078). Among CGM-derived features, markers of glycemic variability were stronger predictors of VAT than conventional mean-glucose metrics. Residual-based analyses identified individuals whose visceral adiposity was substantially higher than expected given their BMI or HbA1c levels. Notably, 1.2% of adults in the HPP cohort exhibited elevated visceral adiposity despite having both a normal BMI (< 25 kg/m{superscript 2}) and normoglycemic HbA1c (< 5.7%). These discordant subpopulations harbored adverse profiles across lipid, hepatic, vascular, sleep, and metabolomic domains. NMR lipoprotein subfractions (VLDL, HDL) discriminated discordant phenotypes. A CGM variability-only model separated discordant individuals at AUC = 0.63, with negligible gain from adding mean glucose. Findings were validated in an independent cohort with available fasting insulin data. Together, these results establish normative IR surrogate reference ranges, quantify the fraction of metabolically at-risk individuals missed by conventional BMI and HbA1c screening, and highlight CGM variability metrics and NMR lipoprotein profiling as complementary tools for early metabolic risk stratification. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=112 SRC="FIGDIR/small/26352290v1_ufig1.gif" ALT="Figure 1"> View larger version (68K): org.highwire.dtl.DTLVardef@1f491a6org.highwire.dtl.DTLVardef@18660a9org.highwire.dtl.DTLVardef@133fa14org.highwire.dtl.DTLVardef@1675463_HPS_FORMAT_FIGEXP M_FIG C_FIG
Zemach, A.; Plaza, M. R.; Lee, B. S.; Little Dod, L.; Santiago-Rodriguez, E.; Simmons, D.; Palomares, M.; Talavera-Adame, D.; Newman, N.
Show abstract
BackgroundPlants produce diverse metabolites with potential benefits for human health. However, the metabolomes of plant callus cultures--cell cultures analogous to stem cells--remain poorly characterized in terms of their functional relevance. MethodsWe profiled the metabolomes of six plant calli: Acacia concinna (Shikakai), Daucus carota (carrot), Hibiscus sabdariffa (hibiscus), Linum usitatissimum (flax), Ocimum sanctum (tulsi), and the Nicotiana tabacum Bright-Yellow 2 (BY-2) cell line. To facilitate functional interpretation, we developed Metabolite2Function (M2F), a pipeline that annotates metabolites with biological functions using scientific literature and large language modeling. ResultsUntargeted metabolomics identified 177 metabolites, revealing clustering patterns independent of genetic relationships, culture age, or growth rate. Tulsi and carrot calli exhibited enrichment in metabolites relative to the tobacco reference line, whereas flax and hibiscus were comparatively depleted. Most metabolites varied across at least four calli, and 10% were unique to a single species. Using M2F, we annotated 87 metabolites with beneficial activities, including antioxidant, anti-glycation, anti-inflammatory, and anti-senescence functions, as well as skin-related effects such as collagen production and brightening. Notably, antioxidant and anti-senescence metabolite levels correlated with corresponding biological activities in human cells. ConclusionsPlant callus cultures generate distinct and functionally diverse bioactive metabolomes. M2F provides a scalable framework for systematic functional annotation relevant to human health and cosmetic applications.
Frongia Mancini, D.; Alabed, H. B. R.; Pellegrino, R. M.
Show abstract
Background/ObjectivesHuman plasma lipidomics provides valuable information on dietary and metabolic phenotypes, but the interpretation of high-dimensional lipid datasets remains challenging. We developed the Nutritional-Metabolic Lipid Profile (NMLP) module within LipidOne to translate plasma lipidomics data into interpretable nutritional-metabolic indices, functional categories, visual outputs, and biological statements. Subjects/MethodsNMLP calculates lipid indices reflecting cardiometabolic lipid status, fatty acid remodelling, overall lipid quality, oxidative protection, and omega-3/essential fatty acid status. The module was applied to three human plasma lipidomics public datasets: a randomized crossover glycemic-load feeding study, a eucaloric high-fat diet intervention in normal-weight women, and a large public dataset stratified by insulin sensitivity. ResultsAcross datasets, NMLP converted complex lipidomic matrices into coherent nutritional-metabolic profiles. In the glycemic-load study, the module highlighted metabolic lipid shifts not captured by standard clinical lipid panels, mainly involving cardiometabolic lipid status, oxidative protection, and fatty acid remodelling. In the high-fat diet intervention, NMLP tracked temporal lipid remodelling across pre-diet, on-diet, and post-diet states, consistent with metabolic adaptation to increased dietary fat exposure. In the insulin-sensitivity dataset, insulin-resistant subjects showed a storage-oriented lipid phenotype characterized by increased neutral lipid storage indices and altered lipid quality and oxidative-protection features. Category-level clustering further revealed heterogeneous nutritional-metabolic states within insulin-resistant subjects. ConclusionsNMLP provides a deeper and clearer interpretative framework for human plasma lipidomics in nutrition and metabolic health research. By translating lipid species into functional indices and category-level readouts, the module may facilitate the use of lipidomics in clinical nutrition, metabolic phenotyping, and precision nutrition studies. NMLP is freely accessible as part of the online LipidOne platform.
Chen, Y.; Gui, T.; Huang, Z.; Quach, N.; Tu, S.; Liu, J.; Garrett, T. J.; Starkweather, A. R.; Lyon, D. E.; Shepherd, B. E.; Tu, X. M.; Lin, T.
Show abstract
SO_SCPLOWUMMARYC_SCPLOWChemotherapy in breast cancer (BC) can substantially affect mental wellness. Advances in metabolomics enable comprehensive profiling of metabolic changes over time during and after treatment, offering insights into biological mechanisms linking chemotherapy to mental health outcomes. To study the association between metabolite profiles and mental wellness, correlation-based analyses are particularly useful. Spearmans rho is a widely used correlation measure and popular alternative to Pearsons correlation, since it also applies to non-linear association between variables. However, existing methods are not designed for longitudinal data and do not allow for covariate adjustments. In this paper, we propose a novel regression-based framework grounded in a class of semiparametric models, the functional response models, to extend this popular correlation measure to longitudinal settings with missing data under the missing at random assumption. This framework facilitates inferences about temporal changes in correlations over time and association of explanatory variables for such changes. We use simulation studies to evaluate performance of the approach with moderate sample sizes. We apply the approach to a one-year longitudinal substudy of the EPIGEN study to examine the longitudinal association between metabolite profiles and mental wellness in BC patients undergoing chemotherapy. The identified metabolites may serve as candidates for future in-depth bioinformatics analyses and translational investigations.
Hauguel, P.; Anctil, N.; Noel, L. P.
Show abstract
BackgroundConstructing digital twins in healthcare requires biological data sources that are simultaneously informative, dynamic, and practical for routine collection. Dried blood spot (DBS) sampling combined with untargeted metabolomics is well suited to meet these requirements: DBS can be self-collected at home and mailed at ambient temperature, while untargeted LC-MS/MS captures thousands of metabolites reflecting individual physiology, lifestyle, and exposures. We previously demonstrated proof-of-concept individual identification from DBS-derived metabolomic profiles in 277 volunteers (80-92% accuracy). Here, we report a large-scale validation on a substantially expanded cohort. MethodsWe collected 18,288 DBS samples from 1,257 individuals across 134 analytical batches over 15 months. Samples were self-collected at home, mailed via standard postal service, and analyzed by untargeted LC-MS/MS on a high-resolution Orbitrap platform in positive ESI mode. Our classification pipeline comprises batch-aware normalization, supervised feature selection, biological signal filtering, dimensionality reduction, and user-level majority voting across all available samples. This voting reflects the real-world use case: participants contribute multiple self-collected DBS cards over time, taken at different times of day and under varying conditions. We employed GroupKFold cross-validation with group=batch to ensure zero batch leakage between training and testing sets. ResultsIn 10-fold GroupKFold cross-validation (group=batch, zero batch leakage), our pipeline achieved 94.1% user-level identification accuracy (85.5% sample-level). In a fully held-out validation on 17 future batches -- with all feature selection, normalization, and model fitting performed exclusively on training data -- performance was even stronger: 96.1% user-level and 92.6% sample-level across 1,134 classes (chance level: 0.088%). Feature selection stability was confirmed via bootstrap analysis. We identified batch leakage as a critical methodological pitfall for the field: naive random splitting inflated accuracy by sharing 92.8% of test samples (user, batch) pairs with the training set. The top discriminative metabolites span biologically relevant pathways including amino acid metabolism, fatty acid transport, and sphingolipid biosynthesis. ConclusionsUntargeted metabolomics from dried blood spots supports batch-aware, closed-set individual identification in a single-laboratory setting, with potential relevance for longitudinal sample-to-person linkage in future digital twin workflows.