Metabolites
○ MDPI AG
Preprints posted in the last 90 days, ranked by how well they match Metabolites's content profile, based on 50 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.
Herrera, L.; Meneses, M. J.; Ribeiro, R. T.; Gardete-Correia, L.; Raposo, J. F.; Boavida, J. M.; Penha-Goncalves, C.; Macedo, M. P.
Show abstract
Background & AimsMetabolic disorders such as dyslipidemia, metabolic dysfunction-associated steatotic liver disease (MASLD), and diabetes are promoted by chronic pro-inflammatory and pro-oxidative states. Paraoxonase 1 (PON1), a liver-derived HDL-associated enzyme, plays an important antioxidant role by hydrolyzing oxidized lipids and protecting against oxidative stress- induced damage. Genetic variation in PON1, particularly in promoter and coding regions, modulates enzyme expression and activity, thereby influencing susceptibility to metabolic and cardiovascular diseases. This study investigated the genetic determinants of serum paraoxonase (PONase) activity and their relationship with dysmetabolic phenotypes. MethodsA genome-wide association study was conducted in 922 Portuguese individuals from the PREVADIAB2 cohort. Genetic variants and haplotypes related to PONase activity were analyzed, and associations with dysglycemia and liver fibrosis were evaluated in individuals aged over 55 years. ResultsWe identified two key PON1 variants as determinants of PONase activity: rs2057681 (in strong linkage disequilibrium with the non-synonymous Q192R variant) and rs854572 (located in the promoter region). Analysis of rs854572-rs2057681 haplotypes revealed that specific combinations differentially modulate PONase activity and confer risk or protection for dysglycemia and liver fibrosis, depending on the rs2057681 genotype context. Notably, although PONase activity was strongly associated with PON1 variants, it did not directly correlate with dysmetabolic phenotypes, suggesting that genetic context and haplotype structure, rather than enzyme activity alone, shape disease susceptibility. ConclusionsThese findings highlight the complex genetic architecture of PON1 and its role in metabolic disease risk, supporting the use of PON1 genetic information to uncover predisposition to dysmetabolic conditions. Our results provide insights into the interplay between PON1 genetics, enzyme function, and dysmetabolism, with implications for risk stratification in metabolic liver disease. Lay SummaryPON1 is a liver-derived gene that encodes an enzyme involved in protection against oxidative stress, a key contributor to metabolic liver disease and diabetes. In this study, we found that specific combinations of PON1 genetic variants are associated with abnormalities in blood glucose regulation and with markers of liver fibrosis. These associations were dependent on genetic configuration rather than enzyme activity alone, suggesting that PON1 genetic information may help identify individuals at higher risk of metabolic liver disease.
Lin, H.; Zhang, L.; Lotfi, A.; Jarmusch, A.; Lee, I.; Kim, A.; Morton, J.; Aksenov, A. A.
Show abstract
This protocol describes a computational approach for constructing correlation-based molecular networks from untargeted metabolomics data using MetVAE, a variational autoencoder-based framework. Complementing spectral similarity networks, it captures functional relationships re-flected in cross-sample correlations. The workflow imports metabolomics features and sample metadata, adjusts for compositionality, missingness, confounding, and high-dimensionality, esti-mates sparse metabolite correlations, and exports GraphML files for network visualization. In a hepatocellular carcinoma mouse model, it links lipid classes in high-fat-diet animals, suggesting an endogenous "auto-brewery" route to lipotoxic metabolites.
Giron, E. C.; Ortega, L. R. T.; Greef, J. M.; Felix, Y. M.; Ortega, N. H. C.; Surup, F.; Medema, M. H.; van der Hooft, J. J. J.
Show abstract
Mass spectral molecular networking (MN) has emerged as a key computational approach to organize and analyze the vast volumes of tandem mass spectrometry (MS/MS) data generated in natural product research. MN connections are based on mass spectral similarities derived from cosine-based scores or machine learning-based scores such as Spec2Vec and MS2DeepScore. These similarity scores are single, deterministic values and provide no estimate of the statistical robustness of the inferred mass spectral connections. As a result, molecular networks frequently contain edges arising from noise, missing fragments, or experimental variability, while, simultaneously, authentic chemical relationships remain hidden. To remedy this situation, here, we introduce SpecReBoot, a statistical framework that adapts Felsensteins bootstrap principle from phylogenetics to metabolomics. Within this framework, mass fragmentation peaks are treated as resampling units with replacement to generate pseudo-replicate spectra. Spectral similarities are recalculated across replicates, and the robustness of each edge between a pair of spectra is quantified by how frequently they appear as mutual top-k neighbors across bootstrap replicates. This approach generates bootstrap-derived confidence scores for every spectral connection, transforming mass spectral similarity from an absolute score into a distribution-based, confidence-aware measure. We show how, across public GNPS spectral library and natural products discovery case study datasets on bioactive metabolites produced by bacteria and fungi, SpecReBoot reliably identifies high-confidence spectral connections, filters unstable or noise-driven edges, and rescues chemically meaningful relationships that conventional metrics systematically miss. Applying SpecReBoot to study the polyketide-lactones produced by the endophytic fungus Diaporthe caliensis revealed previously hidden spectral relationships leading to the discovery of the novel caliensomycin macrolactone scaffold, biosynthetically and biochemically related to the bioactive phomol known polyketide present in D. caliensis. In conclusion, this study provides the first statistical framework for quantifying uncertainty in MS/MS mass spectral similarity. Due to its context-agnostic nature, we anticipate that our computational metabolomics framework can also be adopted across other disciplines like clinical and environmental metabolomics. Altogether, SpecReBoot introduces statistical rigor, improves reproducibility, thus enhancing molecular networking-based natural product discovery.
Cross, E.; Westcott, F.; Smith, K.; Nagarajan, S. R.; Sanna, F.; Dennis, K. M.; Hodson, L.
Show abstract
BackgroundMetabolic dysfunction-associated steatotic liver disease (MASLD) is challenging to study in vivo in humans and in vitro models are limited. Although primary human hepatocytes (PHHs) are considered the gold-standard, immortalized hepatic cell lines are utilised due to scalability. This study compared the metabolic responses of PHHs with our Huh7-based model cultured in physiologically-relevant fatty acid (FA) mixtures. MethodsPHH and Huh7 cells were treated with 2% human serum, sugars and FAs enriched in either unsaturated (OPLA) or saturated (POLA) FAs for 4 or 7 days, respectively. Stable isotope tracers investigated basal metabolic changes in response to treatment. Cell viability, media biochemistry, intracellular metabolism, lipid droplet morphology and gene expression were quantified. ResultsHuh7 cells had greater viability than PHHs, while NEFA uptake and triglyceride secretion were similar. OPLA and POLA increased large lipid droplets in Huh7 cells, whereas only OPLA produced comparable effects in PHHs. Despite higher baseline TG in PHHs, both models showed similar lipid composition, de novo lipogenic responses, and glycogen levels. Compared to Huh7 cells, PHHs exhibited higher 3-hydroxybutyrate, lower lactate, reduced glucose uptake, and donor-dependent transcriptomic variability. ConclusionsHuh7 cells are metabolically adaptable and when cultured in physiologically-relevant media, produce metabolic readouts similar PHH cells.
Tsiara, I.; Vouzaxaki, E.; Ekström, J.; Rameika, N.; Yang, F.; Jain, A.; Iglesias Alonso, A.; Sjöblom, T.; Globisch, D.
Show abstract
Cancer-related casualties are the most common cause of death worldwide. The discovery of biomarkers is of utmost importance for diagnosis and disease monitoring. Herein, we performed a comprehensive metabolomics biomarker discovery effort in plasma from 615 lung, ovarian and colorectal cancer patients at diagnosis and 95 non-cancerous control subjects. This pan-cancer investigation identified specific panels of metabolites in the entire sample cohort with a high discriminating power and demonstrated by combined ROC AUC values of up to 0.95. The identified metabolites are mainly associated with lipid and amino acid metabolism as well as xenobiotic transformation. These metabolite panels of high predictive power provide new metabolic insights in these cancers and demonstrate the potential of metabolomics for improved diagnosis and monitoring disease progression.
Stancliffe, E.; Gandhi, M.; Guzior, D. V.; Mehta, A.; Acharya, S.; Richardson, A. D.; Cho, K.; Cohen, T.; Patti, G. J.
Show abstract
Liquid chromatography coupled to mass spectrometry (LC/MS) is a powerful tool in metabolomics research, generating tens-of-thousands of signals from a single biological sample. However, current software solutions for unbiased assessment of metabolomics data analysis are limited by complex sources of noise and non-quantitative metabolite identifications that make results difficult to interpret. Here, we present MassID, a cloud-based untargeted metabolomics pipeline that aims to overcome the innate challenges of unbiased metabolite analysis and perform end-to-end data processing, transforming raw spectra to normalized and identified metabolite profiles. MassID incorporates a suite of software functionalities, including deep learning-based peak detection and comprehensive noise filtering. In addition, with MassID we introduce a novel software module: DecoID2 that enables probabilistic metabolite identification for false discovery rate (FDR)-controlled metabolomics. When applied to a human plasma dataset, MassID results in near-complete signal annotation, identification of >4,000 metabolites (including >1,200 compounds at an FDR <5%) across four complementary LC/MS runs, and enables integrated downstream analyses to understand biochemical dysregulation at both the molecular and pathway level. When compared to the Metabolomics Standards Initiative (MSI) confidence levels, identification probability generally correlated with MSI levels. However, only 356/418 of MSI Level 1 compounds were identified with <5% FDR and the remaining 884 FDR < 5% compounds were identified from MSI L2-L3 compounds, highlighting the enhanced specificity and discovery potential achieved by MassID.
TEA, I.; Letertre, M.; Boccard, J.; Schiphorst, A.-M.; Blanchet, S.; Croyal, M.; Blackburn, A. C.; Tcherkez, G. G. B.
Show abstract
BackgroundMetabolic reprogramming is a hallmark of breast cancer (BrCa), with alterations in glycolysis, glutamine metabolism, and the urea cycle contributing to tumour progression. Dichloroacetate (DCA), a pyruvate dehydrogenase kinase (PDK) inhibitor, shifts metabolism toward oxidative phosphorylation and has been proposed as a therapeutic agent. While isotope tracing is well-established, natural isotope abundance ({delta}{superscript 1}3C, {delta}{superscript 1}N) is emerging as a biomarker of metabolic alterations in cancer. MethodsWe investigated the relationship between isotope composition and metabolism in BrCa using two BALB/c mouse mammary tumour models (V14 and 4T1) and assessed the effects of DCA treatment using metabolomics, lipidomics and isotopomics. ResultsV14 and 4T1 tumours exhibited isotopic patterns similar to human tumours, with {delta}{superscript 1}3C enrichment and {delta}{superscript 1}N depletion relative to non-cancerous mammary tissue. V14 tumours were more {delta}{superscript 1}N-depleted than 4T1, reflecting differences in nitrogen metabolism. Multivariate analysis integrating isotopic, metabolomic, and lipidomic data revealed isotopic features as key discriminators between tumours and normal tissues. Compared to V14, 4T1 tumours were enriched in TCA intermediates, sphingolipids, and amino acids, whereas V14 tumours showed elevated glutaminolytic and nitrogenous metabolites. DCA treatment differentially affected tumour growth, with V14 tumours more sensitive than 4T1. DCA altered nitrogen metabolism, increasing the arginine-to-ornithine ratio, and modulating {delta}{superscript 1}N values in a tumour-specific manner increasing V14 and decreasing 4T1 {delta}{superscript 1}N values. DCA had little effect on {delta}{superscript 1}3C. {delta}{superscript 1}3C values were primarily determined by the balance between lipid and TCA cycle metabolites, rather than glycolytic flux. {delta}{superscript 1}N variation was linked to nitrogen metabolism, including urea cycle intermediates and sphingolipid composition, with a potential role for choline-related fractionation in {delta}{superscript 1}N depletion. Altered gene expression of Hacd2 and Acot12 in V14 tumours after DCA treatment was reflected in shorter fatty acid tails in phosphatidyl cholines, supporting the lipidomics data. ConclusionsThese findings support the hypothesis that cancer-associated metabolic reprogramming influences natural isotope abundance. Correlations between isotope shifts and metabolic signatures highlight the potential of lipid-derived {delta}{superscript 1}N as a biomarker of tumour metabolic state, with implications for noninvasive metabolic profiling in BrCa. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=141 SRC="FIGDIR/small/710495v1_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@1589d0eorg.highwire.dtl.DTLVardef@af2ad4org.highwire.dtl.DTLVardef@24e67forg.highwire.dtl.DTLVardef@98da7f_HPS_FORMAT_FIGEXP M_FIG C_FIG
Hocini, F. I.; Prigent, S.; Lerisson, A.; Cordazzo, R.; Muller, C.; Rouveyrol, C.; Cassan, C.; Rey, A.; Fuzzati, N.; Gibon, Y.; Cocandeau, V.; Petriacq, P.
Show abstract
The genus Camellia comprises more than 200 evergreen species of major economic and ornamental importance, characterised by high morphological and chemical diversity. While several species have been extensively studied for their bioactive compounds, the metabolic basis of floral trait variation across the genus remains poorly understood. In this study, a predictive metabolomics framework was applied to investigate the relationship between leaf metabolic profiles and floral traits, focusing on flower colour and floral form. Leaves from 315 individual trees, including 15 Camellia species and representing 1,160 samples, were analysed by untargeted metabolomics, generating a large-scale metabolic profiling dataset. A dedicated quality control strategy was implemented to ensure analytical stability across multiple injection series and flowering seasons. Penalised generalised linear models were used to uncover robust metabolic predictors associated with floral traits and to evaluate model performance through internal and external validation. Distinct sets of metabolites were associated with flower colour and floral form, with limited overlap between traits. Predictive performance was consistently higher for colour than for floral form, indicating more structured metabolic signatures for chromatic traits. The selected predictors spanned multiple major chemical classes, supporting a systemic organisation of the metabolome rather than reliance on single biosynthetic pathways. Consistently high predictive accuracies were obtained, reaching approximately 87% for both flower colour and floral form, and remaining clearly above the corresponding no-information rates ({approx} 43%). Together, these results demonstrate that leaf metabolomics can be used to robustly predict floral traits in Camellia and highlight the potential of predictive metabolomics as a tool for early phenotype inference, quality control and selection in long-lived ornamental species.
Arp, N. L.; Deng, F.; Lika, J.; Seim, G. L.; Falco Cobra, P.; Mellado Fritz, C.; John, S. V.; Rathinaraj, S.; Shields, B. E.; Amador-Noguez, D.; Henzler-Wildman, K.; Fan, J.
Show abstract
Identifying metabolites and metabolic reactions specific to a cellular state, such as inflammatory state in immune cells, is of great interest, as it can provide important biomarkers and point to compounds and reactions of specific biological functions. However, many cell state-specific metabolites remain in the unannotated part of metabolome. Here we identified a series of sulfur-containing metabolites that are actively produced in macrophages upon classical activation, but not in resting state or alternative activation state. Isotopic tracing, in vitro assays and genetic perturbations further revealed that they are formed from reactions between free cysteine and several important intermediates in glycolysis and TCA cycle. Upon classical activation, macrophages specifically upregulate the import of cystine via Slc7a11, supporting the production of these adducts. Their production dynamically responds to changes in central metabolism, environmental nutrient levels, and is regulated by nitric oxide. Finally, we confirmed these newly identified compounds also present in human samples, and most of them are significantly elevated in inflammatory granuloma annulare lesions. This work elucidated a previously uncharted part of metabolic network that is associated with inflammation and metabolic stress condition, which has important implications and set foundation for many future discoveries.
Ecker, L. R.; de Santana, N. A. C.; Caldato, C. F.; Teixeira, C. E.
Show abstract
IntroductionBlood glucose monitoring is essential for the management of diabetes mellitus. Continuous interstitial glucose (IG) monitoring systems are less invasive than capillary blood glucose (BG) measurements, but their agreement decreases at higher glucose levels. Artificial intelligence (AI) approaches, particularly recurrent neural networks such as long short-term memory (LSTM), have shown potential to model temporal glucose dynamics and correct inter-method discrepancies. Objective: To develop and validate an AI-based model capable of predicting capillary BG values from IG data, improving agreement between methods and enhancing glycemic status classification. Methods: This retrospective observational study analyzed 708 paired BG-IG measurements obtained from published anonymized datasets. Data preprocessing included Kalman filtering, robust normalization, temporal windowing, and class balancing via oversampling. An LSTM model with dual output was trained to perform both capillary glucose regression and glycemic status classification. Model performance was assessed using regression metrics (MAE, RMSE, R2), classification metrics (accuracy, F1-score), and agreement analysis (Bland-Altman). Results: The AI model substantially reduced the mean bias from +16.27 mg/dL to -2.08 mg/dL and achieved markedly narrower limits of agreement compared with raw BG-IG differences (-129.5 to +162.0 mg/dL vs. -47.3 to +43.2 mg/dL). Glycemic classification accuracy was high for hyperglycemia (94.6%), prediabetes (93.7%) and normoglycemia (100%), with lower performance observed for hypoglycemia (66.7%). Conclusion: LSTM-based AI modeling demonstrated strong capability to predict capillary BG from IG measurements and to correct inter-method discordance. These findings support the potential integration of AI-enhanced glucose estimation into clinical monitoring systems to improve therapeutic decision-making.
Kwiendacz, H.; Cembrowska-Lech, D.; Skonieczna-Zydecka, K.; Klimontowicz, K.; Podsiadło, K.; Wierzbicka-Wos, A.; Styburski, D.; Kaczmarczyk, M.; Gumprecht, J.; Łoniewski, I.; Nabrdalik, K.
Show abstract
BackgroundMetformin is the cornerstone therapy for type 2 diabetes, but gastrointestinal intolerance commonly limits dose escalation and long-term adherence. In the ProGasMet trial, multi-strain probiotic supplementation improved metformin tolerability. However, the underlying microbiome-metabolome mechanisms remain unclear. Methods and analysisWe performed an exploratory multi-omics analysis using Period 1 of a randomized, double-blind, placebo-controlled trial. Participants with metformin intolerance received a multi-strain probiotic or placebo for 12 weeks. Paired stool samples collected at baseline (Visit 2) and end of treatment (Visit 5) were available from 34 participants (68 samples). We integrated shotgun metagenomic species profiles, predicted gut metabolic modules, and untargeted faecal LC-MS metabolomics using multi-block sparse PLS (DIABLO), complemented by longitudinal feature-level analyses and associations with gastrointestinal symptom burden (QACSMI and a simplified GI score). ResultsMulti-omics integration showed moderate concordance across taxonomic, functional, and metabolomic blocks and separated probiotic from placebo profiles at 12 weeks. Bile acid-related metabolites were among the strongest contributors to group separation, with hyodeoxycholic acid and related compounds enriched in the probiotic arm. Global biodiversity and community-wide turnover did not differ materially between groups. Feature-level analyses suggested modest, directionally coherent changes in selected taxa, functional modules, and metabolites. Higher hyodeoxycholic acid concentrations at Visit 5 were associated with lower gastrointestinal symptom burden in probiotic-treated participants, a pattern not observed under placebo; statistical support was exploratory. ConclusionProbiotic supplementation may be associated with coordinated microbiome-metabolome shifts in metformin-intolerant type 2 diabetes, highlighting bile acid remodelling, particularly hyodeoxycholic acid, as a plausible mechanistic candidate for improved tolerability.
Luning, Z.; Shuang, W.; Jixing, P.; Xiaofei, H.; Wenxue, W.; Dehai, L.
Show abstract
Spectral similarity is widely used as a proxy for structural similarity in tandem mass spectrometry (MS/MS) analyses, including library matching and molecular networking. However, the relationship between spectral similarity scores and true structural similarity remains imperfect, limiting compound identification in metabolomics studies. Here, we present BertMS, a spectral similarity framework based on bidirectional encoder representations from transformers (BERT), which learns contextualized representations of fragment ions from large-scale MS/MS data. Using datasets from MoNA and GNPS comprising over 100,000 unique molecules, we systematically evaluate BertMS against existing methods, including cosine similarity and Spec2Vec. BertMS shows improved performance across multiple evaluation metrics, with average gains of approximately 15-25% depending on the task. Notably, improvements are most evident in molecular similarity assessment. We further demonstrate the applicability of BertMS in molecular networking and dereplication of microbial metabolites, where it enables more consistent identification of structurally related compounds. Together, these results demonstrate that transformer-based representations improve spectral similarity estimation and enable more reliable metabolite annotation in complex mixtures.
Anctil, N.; Hauguel, P.; Noel, L.-P.
Show abstract
Background. Breast cancer (BC) remains the most diagnosed malignancy and leading cancer-related cause of mortality in women worldwide. Although blood-based untargeted metabolomics has emerged as a promising modality for detecting early-stage BC, the clinical translation of this approach has been bottlenecked by two unresolved issues: (i) the field has almost exclusively relied on serum or plasma, which require venipuncture and cold-chain logistics, and (ii) machine-learning models reported on such data are frequently validated with protocols that are blind to analytical batch structure, producing optimistically biased performance estimates. Methods. We present a breast cancer detection study based on dried blood spots (DBS), an analytical matrix that enables self-collection and ambient-temperature shipping. A cohort of 2,734 participants (114 biopsy-confirmed BC cases; 2,620 non-cancer controls) was profiled by untargeted LC-MS/MS on a Thermo Scientific Orbitrap IQ-X coupled to a Vanquish UHPLC. A 39-metabolite panel meeting MSI Level 1 identification criteria was pre-specified a priori from the published breast-cancer metabolomics literature, frozen prior to LC-MS acquisition, and applied to the present cohort without any feature selection on the data. Six standard supervised-learning architectures (LASSO, Elastic Net, Linear SVM, PLS-DA, OPLS-DA, XGBoost) were evaluated on this pre-specified panel; OPLS-DA is reported only in the sex-matched subgroup analysis where a single-seed 5-fold stratified protocol permits a directly comparable fit. Per-batch control-median normalization is applied upstream; kNN imputation, log transform, and robust scaling are fit within each training fold. The evaluation battery comprises batch-aware StratifiedGroupKFold CV at single-seed (seed=42) with inter-seed SD quantified across 10 independent seeds, batch-aware nested CV, a 100-seed held-out 20%-batch validation with disjoint-batch isotonic probability calibration (30% calibration partition), PPV/NPV reporting at multiple operating points and three deployment prevalences, subgroup analyses by TNM stage and tumor grade, pathway-ablation sensitivity analysis, and a 1,000-iteration permutation test. Results. Under batch-aware evaluation (StratifiedGroupKFold, single-seed=42), AUC ranged from 0.914 to 0.949 across classifiers, with LASSO achieving 0.928 and XGBoost 0.949; inter-seed SD across 10 seeds was 0.002-0.006. At 95% specificity, LASSO reached 75.4% sensitivity and XGBoost 81.6%. Held-out batch validation (100 seeds) yielded mean AUC 0.912 for Elastic Net and 0.935 for XGBoost, confirming robust generalization. All 39 panel features showed high coefficient stability, and permutation testing on representative classifiers (LASSO, Linear SVM, PLS-DA) yielded p <= 0.001. Subgroup analyses showed weaker detection of stage IIA tumors (AUC 0.87, n=40) compared with stage IIB/IIIA (AUC 0.95), consistent with stronger metabolic signatures in more advanced disease. Bootstrap coefficient consistency of the Elastic Net classifier confirmed that all 39 panel features received a non-zero multivariate weight in >=80% of 100 stratified bootstraps. Conclusions. On this cohort of diagnosed, pre-treatment breast-cancer cases, DBS LC-MS metabolomic profiling delivers classification performance (AUC 0.928 for LASSO and 0.949 for XGBoost under batch-aware GroupKFold CV at single-seed=42; held-out AUC 0.912-0.935) that is robust across classifier families and biological pathways. The DBS matrix is non-radiating, self-collectable by finger-prick, and mailable at ambient temperature. Performance is weaker on stage IIA than on more advanced disease, and prospective validation in an independent asymptomatic screening cohort is required before clinical positioning as a decentralized triage modality.
Chatzis, C.; Horner, D.; Bro, R.; Schoos, A.-M. M.; Rasmussen, M. A.; Acar, E.
Show abstract
MotivationTemporal multivariate data is ubiquitous in many domains, for instance, being collected over time at planned visits (every few months/years) in longitudinal cohorts, or every few minutes/hours in challenge tests. The analysis of such data often focuses on revealing the underlying temporal patterns common across subjects. However, there are subject-specific differences in temporal patterns, which hold the promise to enhance our understanding of underlying mechanisms and facilitate personalized approaches. Nevertheless, extracting subject-specific temporal patterns from longitudinal multivariate data reliably is an open challenge. ResultsWe introduce coupled matrix factorizations (CMF) as effective tools to capture subject-specific temporal patterns focusing on two novel applications: analysis of longitudinal metabolomics data and sensitization data. Our analysis shows that CMF models reliably capture subject-specific (shape) differences in temporal patterns revealing further in-sights compared to the state of the art. In metabolomics, CMF models reveal differences in metabolic responses of individuals (in a postprandial meal challenge) according to anthropometric and insulin sensitivity measures. In sensitization data analysis, CMF-based methods capture differences in temporal trajectories of children according to delivery/birth mode. We demonstrate the reliability of extracted patterns using reproducibility and replicability. AvailabilityThe code is available on https://github.com/cchatzis/Revealing-Subject-specific-Temporal-Patterns-from-Longitudinal-Data. Clinical data is not publicly available due to privacy reasons. Data can be made available under a joint research collaboration by contacting COPSAC (administration@dbac.dk).
Zemach, A.; Plaza, M. R.; Lee, B. S.; Little Dod, L.; Santiago-Rodriguez, E.; Simmons, D.; Palomares, M.; Talavera-Adame, D.; Newman, N.
Show abstract
BackgroundPlants produce diverse metabolites with potential benefits for human health. However, the metabolomes of plant callus cultures--cell cultures analogous to stem cells--remain poorly characterized in terms of their functional relevance. MethodsWe profiled the metabolomes of six plant calli: Acacia concinna (Shikakai), Daucus carota (carrot), Hibiscus sabdariffa (hibiscus), Linum usitatissimum (flax), Ocimum sanctum (tulsi), and the Nicotiana tabacum Bright-Yellow 2 (BY-2) cell line. To facilitate functional interpretation, we developed Metabolite2Function (M2F), a pipeline that annotates metabolites with biological functions using scientific literature and large language modeling. ResultsUntargeted metabolomics identified 177 metabolites, revealing clustering patterns independent of genetic relationships, culture age, or growth rate. Tulsi and carrot calli exhibited enrichment in metabolites relative to the tobacco reference line, whereas flax and hibiscus were comparatively depleted. Most metabolites varied across at least four calli, and 10% were unique to a single species. Using M2F, we annotated 87 metabolites with beneficial activities, including antioxidant, anti-glycation, anti-inflammatory, and anti-senescence functions, as well as skin-related effects such as collagen production and brightening. Notably, antioxidant and anti-senescence metabolite levels correlated with corresponding biological activities in human cells. ConclusionsPlant callus cultures generate distinct and functionally diverse bioactive metabolomes. M2F provides a scalable framework for systematic functional annotation relevant to human health and cosmetic applications.
Chen, Y.; Gui, T.; Huang, Z.; Quach, N.; Tu, S.; Liu, J.; Garrett, T. J.; Starkweather, A. R.; Lyon, D. E.; Shepherd, B. E.; Tu, X. M.; Lin, T.
Show abstract
SO_SCPLOWUMMARYC_SCPLOWChemotherapy in breast cancer (BC) can substantially affect mental wellness. Advances in metabolomics enable comprehensive profiling of metabolic changes over time during and after treatment, offering insights into biological mechanisms linking chemotherapy to mental health outcomes. To study the association between metabolite profiles and mental wellness, correlation-based analyses are particularly useful. Spearmans rho is a widely used correlation measure and popular alternative to Pearsons correlation, since it also applies to non-linear association between variables. However, existing methods are not designed for longitudinal data and do not allow for covariate adjustments. In this paper, we propose a novel regression-based framework grounded in a class of semiparametric models, the functional response models, to extend this popular correlation measure to longitudinal settings with missing data under the missing at random assumption. This framework facilitates inferences about temporal changes in correlations over time and association of explanatory variables for such changes. We use simulation studies to evaluate performance of the approach with moderate sample sizes. We apply the approach to a one-year longitudinal substudy of the EPIGEN study to examine the longitudinal association between metabolite profiles and mental wellness in BC patients undergoing chemotherapy. The identified metabolites may serve as candidates for future in-depth bioinformatics analyses and translational investigations.
Harris, T.; Karlinski Zur, M.; Sapir, T.; Reiner, O.; Schmidt, R.
Show abstract
Metabolic dysregulation is increasingly recognized as a key contributor to neurodevelopmental disorders. Here, we present Intelliwaste, a non-invasive, cost-effective method for profiling carbon metabolism in pluripotent stem cells and brain organoids using 13C-labeled metabolites and 1H and 13C NMR spectroscopy. This approach enables longitudinal analysis of extracellular fluxes without disrupting cell viability. We apply Intelliwaste to human embryonic stem cells (hESCs) cultured in a defined media enriched with >95% 13C1-Glucose. Under these conditions, 13C3-lactate emerged as the most abundant labeled product, with 20-50-fold lower fluxes to 13C3-alanine, 13C2-acetate, 13C3-serine, and 13C3-pyruvate, and 100-300-fold lower fluxes to 13C1-formate and multiple 13C-labeled glutamate species. These profiles allow for precise quantification of fractional metabolic isotopic labeling and glucose-derived carbon flow. To demonstrate biological utility, we first examine the effect of L-glutamine omission, which selectively reduces 13C3-alanine/13C3-lactate and 13C4-glutamate/13C3-lactate flux ratios, while the 13C3-Glutamate/13C3-Lactate and 13C2-Glutamate/13C3-Lactate flux ratios remained unchanged. These findings suggest a specific role for extracellular glutamine in modulating the activity of alanine aminotransferase and pyruvate carboxylase. We then characterized LIS1 mutant hESCs--a model of lissencephaly--and observed significantly increased flux ratios involving 13C4-, 13C3-, and 13C2-glutamate relative to 13C3-lactate, indicating enhanced glutamate production via the TCA cycle. These findings establish Intelliwaste as a powerful tool for metabolic profiling in the study of human neurodevelopment and disease. Its non-destructive nature makes it particularly well-suited for tracking metabolic changes during differentiation and in patient-derived organoid models of neurological disorders.
Hauguel, P.; Anctil, N.; Noel, L. P.
Show abstract
BackgroundConstructing digital twins in healthcare requires biological data sources that are simultaneously informative, dynamic, and practical for routine collection. Dried blood spot (DBS) sampling combined with untargeted metabolomics is well suited to meet these requirements: DBS can be self-collected at home and mailed at ambient temperature, while untargeted LC-MS/MS captures thousands of metabolites reflecting individual physiology, lifestyle, and exposures. We previously demonstrated proof-of-concept individual identification from DBS-derived metabolomic profiles in 277 volunteers (80-92% accuracy). Here, we report a large-scale validation on a substantially expanded cohort. MethodsWe collected 18,288 DBS samples from 1,257 individuals across 134 analytical batches over 15 months. Samples were self-collected at home, mailed via standard postal service, and analyzed by untargeted LC-MS/MS on a high-resolution Orbitrap platform in positive ESI mode. Our classification pipeline comprises batch-aware normalization, supervised feature selection, biological signal filtering, dimensionality reduction, and user-level majority voting across all available samples. This voting reflects the real-world use case: participants contribute multiple self-collected DBS cards over time, taken at different times of day and under varying conditions. We employed GroupKFold cross-validation with group=batch to ensure zero batch leakage between training and testing sets. ResultsIn 10-fold GroupKFold cross-validation (group=batch, zero batch leakage), our pipeline achieved 94.1% user-level identification accuracy (85.5% sample-level). In a fully held-out validation on 17 future batches -- with all feature selection, normalization, and model fitting performed exclusively on training data -- performance was even stronger: 96.1% user-level and 92.6% sample-level across 1,134 classes (chance level: 0.088%). Feature selection stability was confirmed via bootstrap analysis. We identified batch leakage as a critical methodological pitfall for the field: naive random splitting inflated accuracy by sharing 92.8% of test samples (user, batch) pairs with the training set. The top discriminative metabolites span biologically relevant pathways including amino acid metabolism, fatty acid transport, and sphingolipid biosynthesis. ConclusionsUntargeted metabolomics from dried blood spots supports batch-aware, closed-set individual identification in a single-laboratory setting, with potential relevance for longitudinal sample-to-person linkage in future digital twin workflows.
David, M.; Adam, K.-P.; Li, D.; Lim, X. Y.; Hurrell, J. G. R.; Preston, S.; Peake, D. A.; Batarseh, A.
Show abstract
Lipid metabolism is increasingly recognized as a hallmark of cancer, yet translating lipidomic discoveries into clinically actionable biomarkers remains constrained by analytical variability and limited standardized validation frameworks. This challenge is further compounded by a chicken-or-egg problem, where expensive standards and labelled internal standards are required to identify and quantitate target lipids, but the diagnostic importance of these targets is uncertain until they can be reliably measured. Previous work had indicated the potential of 48 lipid biomarker species for the prediction of breast cancer from plasma samples using high resolution liquid chromatography mass spectrometry. This study aimed to identify each of these 48 species and develop a quantitative method to determine the absolute concentrations of these lipids in plasma to provide the basis for the development of a clinical assay for use in breast cancer detection. In doing so, we present a pragmatic workflow that bridges lipid discovery with lipid identification and robust quantitative analysis. A curated library of 48 lipid species was established using authentic standards to verify plasma lipids through retention-time matching and high-resolution spectral comparison. In plasma, 41 lipids were confidently identified based on co-elution with standards and diagnostic fragment ions. Method qualification, including assessment of accuracy, precision, recovery, and linearity, was performed across all 48 lipids in parallel with identification, and 46 lipids ultimately met all predefined qualification criteria. Notably, practical constraints, including time, cost, and availability of authentic standards, necessitated performing identification and targeted method development in parallel, highlighting challenges inherent to translating lipidomics into commercial or clinical assays. This workflow provides a reproducible framework for harmonizing lipid identification and quantification, enabling the reliable integration of lipidomic data into biomarker discovery and clinical applications.
Marsiglia, M. D.; Dei Cas, M.; Bianchi, S.; Borghi, E.
Show abstract
AbstractO_ST_ABSBackgroundC_ST_ABSShort-chain fatty acids (SCFAs) are widely used as functional readouts of gut microbial activity in vivo. The growing adoption of decentralised study designs and self-collection protocols has amplified the need for reliable room-temperature storage and shipment strategies. However, SCFAs volatility and the persistence of post-collection microbial metabolism raise concerns regarding pre-analytical stability and the interpretability of measured concentrations. MethodsWe assessed the temporal stability of fatty acids (FAs) across intestinal and systemic matrices under room-temperature storage. Untreated stool was compared with two nucleic acid stabilisation devices (eNAT and OMNIgene-GUT), while whole blood, plasma and dried blood spots (DBS) were evaluated as minimally invasive systemic sampling strategies. Profiles were quantified using complementary GC-MS and LC-MS/MS workflows. ResultsUntreated stool showed fermentation-driven increases in major SCFAs, whereas immediate freezing preserved baseline profiles. eNAT maintained faecal FA stability for up to 21 days, while OMNIgene-GUT exhibited baseline and time-dependent alterations. In systemic matrices, plasma and whole blood showed upward drift, whereas DBS declined initially before stabilising after approximately 14 days. ConclusionsFA measurements are highly matrix- and device-dependent. Our findings provide practical guidance for the selection of sampling strategies in microbiome-associated FA studies and emphasise the need for controlled pre-analytical conditions in decentralised microbiome studies.