Back

Microbiome

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Microbiome's content profile, based on 139 papers previously published here. The average preprint has a 0.13% match score for this journal, so anything above that is already an above-average fit.

1
MetaGEAR Explorer: Rapid interactive searches and cross-cohort analyses of microbiome gene associations in disease

Rios, E.; Jin, S.; Zhang, C.; Neuhaus, F.; He, X.; Weissenberger, S.; Schirmer, M.

2026-03-31 bioinformatics 10.64898/2026.03.30.715271 medRxiv
Top 0.1%
33.2%
Show abstract

The human gut microbiome has been linked to inflammatory bowel disease (IBD) and colorectal cancer (CRC), yet identifying disease-associated microbial genes across diverse human cohort studies remains challenging due to inconsistent data processing and the high dimensionality of gene-level abundance profiles. Here we present MetaGEAR Explorer, a web platform comprising a user interface and web services for interactive and programmatic gene-centric exploration of >33 million microbial gene families across 9,053 metagenomic samples from 24 IBD, CRC, and healthy cohorts. MetaGEAR Explorer facilitates gene searches against a catalog of non-redundant gene families via nucleotide or amino acid sequence queries (BLAST) and Pfam domain-based searches. For matched gene families, the platform computes disease-stratified prevalence, cross-cohort disease associations, species-level taxonomic stratification, and functional domain annotations. Importantly, users can also explore the genomic context of individual gene families via contig-based co-localization networks derived from metagenomic species pangenome (MSP) assignments and pivot from sequence to domain searches to identify functional homologs. Additionally, the platform features a dedicated catalog to interactively browse 13,795 MSPs and export results programmatically via API endpoints. We demonstrate MetaGEAR Explorers utility using the narG-encoding nitrate reductase gene and a case study of colibactin self-protection genes (clbS and DUF1706 homologs), where the platform revealed a consistent shift from commensals to Gammaproteobacteria carriers in disease. In summary, MetaGEAR Explorer enables rapid cross-cohort functional meta-analyses and is freely available at https://metagear-explorer.schirmerlab.de. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=177 HEIGHT=200 SRC="FIGDIR/small/715271v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): org.highwire.dtl.DTLVardef@ea318dorg.highwire.dtl.DTLVardef@15b497borg.highwire.dtl.DTLVardef@354abcorg.highwire.dtl.DTLVardef@bd7dc5_HPS_FORMAT_FIGEXP M_FIG C_FIG

2
ZeaMiC: a Publicly Available Culture Collection of Maize Root-Associated Bacteria

Garrell, A.-K.; Ginnan, N.; Swift, J. F.; Pal, G.; Zervas, A.; Pestalozzi, C.; Tang, C.; Tso, F.; Ford, N. E.; Niu, B.; Castrillo, G.; Schlaeppi, K.; Hahnke, R. L.; Wagner, M. R.; Kleiner, M.

2026-03-24 microbiology 10.64898/2026.03.23.713778 medRxiv
Top 0.1%
32.0%
Show abstract

Plant-associated microbiota are composed of hundreds of microbial species. For many of them, little is known about their individual functions and even less is known about their emergent community-level traits. While culture-independent methods provide valuable insights into the composition, diversity, and functional potential of plant-associated microbiota, culture-dependent methods are essential for reductionist lines of inquiry into the roles of individual species and their interactions within a community. Here, we present ZeaMiC, a publicly available culture collection of root-associated bacteria from Zea mays (maize). This resource comprises 88 isolates obtained from diverse soils and several maize genotypes, with live cultures available through DSMZ (German Collection of Microorganisms and Cell Cultures) both as single stocks and as cost-effective bundles (https://www.dsmz.de/collection/catalogue/microorganisms/microbiota/zeamic). To maximize relevance, isolates were selected to be representative of maize root-associated microbiomes in the Corn Belt of the United States, based on abundance-occupancy patterns from previously published root microbiome data, phylogenetic diversity, and literature-based evidence of functional importance. Whole-genome sequencing and annotation revealed genes associated with root colonization, plant growth promotion, and nutrient cycling, including functions such as chemotaxis, biofilm formation, secretion systems, hormone modulation, and phosphate solubilization. This collection serves as a community resource for future mechanistic studies of plant-microbe and microbe-microbe interactions, filling the gap in our understanding of the ecological interactions in plant microbiomes.

3
Who Infects Whom? Exploiting Bacterial Minicells for Targeted Virome Enrichment and Phage-Host Interaction Analysis through an Integrated Metagenomic Approach

Pedramfar, A.; Ensenat, E.; Allcock, N. S.; Millard, A. D.; Galyov, E. E.

2026-04-09 microbiology 10.64898/2026.04.08.717211 medRxiv
Top 0.1%
23.3%
Show abstract

Linking bacteriophages (phages) to their hosts remains a fundamental challenge to understanding microbial ecology, viral evolution, and horizontal gene transfer. Although phages are the most abundant biological entities on Earth, the majority of them remain uncharacterized due to the lack of efficient host-linking approaches. Traditional methods, such as plaque assays, have significant limitations as they depend on visible lysis and therefore fail to detect phages that do not form plaques. Conversely, shotgun metagenomics can recover viral genomes directly from environmental samples; however, it cannot directly link phages to their bacterial hosts. In this study, we addressed this limitation by tackling the critical question of "who infects whom?" through the development of a novel, culture-independent approach that utilises an anucleate bacterial minicells-based platform to enrich for phages capable of infecting a target bacterial host. To validate our approach, purified Escherichia coli minicells were exposed to a concentrated viral fraction derived from sewage samples. Genomic DNA from phages that successfully infected and interacted with the E. coli minicells was isolated, amplified, and sequenced. Metagenomic analysis revealed a distinct E. coli-specific virome, including several putatively novel phage species and genera. This platform effectively bridges the gap between culture-dependent and metagenomic methods, providing a scalable, host-targeted tool for identifying phage-host pairs. Our approach also opens new opportunities for studying phage-host interaction networks in complex microbial ecosystems and enhances our ability to investigate viral diversity, host specificity, and the ecological roles of phages in natural environments.

4
Global diversity and distribution of coral-associated protists

del Campo, J.; Bonacolta, A. M.; Weiler, B. A.; Knowles, B.; Apprill, A.; Fox, M. D.; Wakeman, K. C.; Vermeij, M. J. A.; Rohwer, F.; Keeling, P. J.

2026-01-22 microbiology 10.64898/2026.01.22.700988 medRxiv
Top 0.1%
22.7%
Show abstract

Coral reefs are critical ecosystems and biodiversity hotspots that provide ecological stability and essential services to coastal communities. The coral holobiont, a complex symbiotic system composed of the coral animal and a diverse array of associated microbes, plays a central role in coral health and resilience. While Symbiodiniaceae and bacterial symbionts have been extensively studied, much less is known about the diversity and function of microbial eukaryotes such as protists and fungi. These organisms are increasingly recognized as important, yet remain vastly underexplored. Here, we present the first global survey of the coral-associated eukaryome using an anti-metazoan 18S rRNA primer set to bypass host DNA amplification. Our dataset includes corals and related anthozoans from the Caribbean Sea, the Red Sea, and several locations in the Pacific Ocean, spanning a broad taxonomic and geographic range, and includes both healthy and diseased specimens. They reveal a eukaryome that is not only more diverse than that of global coastal waters but also surpasses the diversity of the well-studied coral bacterial microbiome. We recover diverse microbial eukaryotic communities, including Symbiodiniaceae, other known symbionts, potential pathogens, and previously uncharacterized lineages. These results reveal consistent patterns across coral groups and geographic regions. This study provides the most comprehensive taxonomic overview of coral-associated microbial eukaryotes to date, offering new insights into their roles within the holobiont. Our findings highlight the ecological significance of microbial eukaryotes and underscore the importance of incorporating them into broader coral reef research and conservation strategies.

5
Quantifying the oxygen preferences of bacterial communities using a metagenome-based approach

Bueno de Mesquita, C. P.; Stallard-Olivera, E.; Fierer, N.

2026-01-23 microbiology 10.64898/2026.01.22.701213 medRxiv
Top 0.1%
22.6%
Show abstract

Oxygen is a primary driver of the distribution and activity of microbial life. Since oxygen levels are often difficult to measure in situ, one potential solution is to use bacteria as bioindicators of oxygen levels. As bacteria range from obligate aerobes to obligate anaerobes, quantification of bacterial community oxygen preferences could be used to infer variation in environmental oxygen levels and bacterial metabolic strategies. After using ensemble machine learning to select the 20 most important genes that predict oxygen tolerances in individual bacteria, we established a relationship between the abundance ratio of aerobic: anaerobic indicator genes and the proportional abundance of aerobic bacteria using simulated metagenomes with varying ratios of known aerobic and anaerobic bacteria. We developed a tool, OxyMetaG, that takes metagenomic reads as input, extracts bacterial reads, maps reads to the 20 genes, and predicts the proportion of aerobic versus anaerobic bacteria in any given sample. We tested OxyMetaG on a suite of metagenomes with measured or inferred oxygen levels across a variety of environmental and host-associated samples. To demonstrate the utility of our approach, we applied OxyMetaG to 540 surface soils, showing that surface soils are typically dominated by aerobes, but wetter sites with finer textures have relatively more anaerobes. Lastly, we applied OxyMetaG to 73 human gut samples, showing that in the first three years of life, human guts progress from having up to 61% aerobes to being completely dominated by anaerobes. We expect OxyMetaG to have broad utility for characterizing both modern and ancient environments. ImportanceOxygen is one of the most important environmental variables affecting microbial activity and composition but is often difficult to measure in situ. We developed a tool, OxyMetaG, that leverages differences in bacterial gene content across known aerobic and anaerobic taxa to predict the proportion of aerobes and anaerobes in a given sample directly from shotgun metagenomic reads. OxyMetaG works on samples with low sequencing depth and avoids computationally expensive genome assembly, which often captures only a fraction of the microbial community in a given environment. With OxyMetaG, bacteria can be used as bioindicators of oxygen availability over broader time scales than just a single measurement and provide crucial environmental context in cases where oxygen has not or cannot be measured. OxyMetaG is publicly available and can be used to answer a wide variety of ecological questions in both environmental and host-associated systems.

6
Micro16S: Universal Phylogenetic 16S rRNA Gene Representations for Deep Learning of the Microbiome

Bishop, H. V.; Ogilvie, O. J.; Dobson, R. C. J.; Herbold, C. W.

2026-03-24 bioinformatics 10.64898/2026.03.21.713432 medRxiv
Top 0.1%
22.3%
Show abstract

1Existing self-supervised microbiome models represent taxa as discrete, independent units restricted to fixed vocabularies, disregarding their evolutionary context. Here we present Micro16S, a deep learning approach that embeds 16S ribosomal RNA gene sequences into a continuous vector space according to phylogenetic relationships derived from the Genome Taxonomy Database. Using a combination of triplet and pair loss objectives, the model learns representations where spatial proximity reflects phylogenetic relatedness, while remaining largely invariant to the specific 16S rRNA region. Evaluations demonstrate taxonomically coherent clustering across most ranks and substantially improved region invariance compared to k-mer frequency baselines. A transformer pretrained on 50,418 unlabelled gut microbiome samples using these embeddings captured biologically meaningful community structure, though classical machine learning baselines outperformed Micro16S across six benchmark classification tasks, highlighting the limitations of the current system. These results establish the feasibility of phylogenetic embeddings for microbiome deep learning and identify mining algorithm design and class imbalance as primary targets for future improvement.

7
StrataBionn: a neural network supervised classification method for microbial communities

Symons, A. E.; Huynh, A. V.; Cornejo, O. E.

2026-04-02 genomics 10.64898/2026.03.31.715659 medRxiv
Top 0.1%
22.0%
Show abstract

The classification of microbial communities into discrete states or "community state types" (CSTs) is fundamental to understanding host-microbiome interactions and their clinical implications. Traditional methods, such as the nearest-neighbor approaches, often struggle with the inherent noise, high dimensionality, and non-linear signatures of taxonomic profiles. We present a novel supervised framework for microbial community classification, leveraging an Artificial Neural Network (ANN) architecture implemented in a new tool we named StrataBionn. We rigorously evaluated our approach using large-scale vaginal microbiome datasets, directly benchmarking performance against VALENCIA and a Random Forest (RF) classifier. To demonstrate the versatility of our models, we further extended the framework to oral microbiome classification, assessing its stability across diverse anatomical sites. Our supervised models consistently outperformed the nearest-neighbor approach across all evaluated datasets. In the vaginal microbiome, our method achieved an 11.6% to 13.3% increase in performance across all primary metrics, including precision, recall, accuracy, and F1-score. Furthermore, we demonstrate that this performance advantage is maintained in the oral microbiome, highlighting the generalizability of our neural network and ensemble strategies to various microbial ecosystems without the need for niche-specific algorithmic adjustments. By capturing complex feature dependencies that distance-based methods overlook, our approach provides a more robust and accurate census of microbial community structures. StrataBionns ability to learn classification schemes for any microbiome with high accuracy and explainability, through the use of provided utilities to visualize feature-space classification boundaries and perform perturbation analysis on trained classifiers, makes it ideal for broad application in microecology research. This framework offers a scalable, high-performance alternative for microbiome researchers, facilitating more precise clinical stratification and biological insights across hosts body sites.

8
MATRIX: Rapid Quantification of Total and Active Microbial Cells with Single Cell Phenotypes for Environmental Microbiomes

Gonzalo, M.; Liu, X.; Dufour, Y. S.; Shade, A.

2026-03-18 microbiology 10.64898/2026.03.16.712149 medRxiv
Top 0.1%
21.8%
Show abstract

Quantifying the abundance and activity of bacteria within populations and communities is fundamental to systems microbiology and microbiome research. Yet direct microscopic cell counting remains low-throughput, labor-intensive, and prone to user variability, leading many researchers to rely on indirect proxies such as optical density or multicopy marker-gene quantification. These indirect approaches do not distinguish between active and inactive cells and can obscure ecological interpretation. Here, we introduce MATRIX (Microbial Activity and Total cell quantification via Rapid Imaging and eXtraction), an efficient workflow that integrates sample extraction, fluorescence staining, automated microscopy and image analysis, and Bayesian statistical inference to quantify total and redox-active cells and derive single-cell measurements for environmental microbial populations and communities. We demonstrate its reproducibility and versatility using both cultured isolates and high-diversity soil communities. The resulting quantitative, phenotypic datasets provide rapid, direct measurements of population of community size and activity, enabling well-powered analyses that strengthen mechanistic insight into microbial responses and improve the ecological grounding of microbiome studies. ImportanceMicrobiome studies commonly rely on relative abundance data, which cannot distinguish whether compositional shifts reflect true population growth, declines in total community size, or both. Without explicit measurements of population and community sizes, mechanistic interpretation of microbiome dynamics remains incomplete. Here we present a rapid, throughput workflow, MATRIX, that quantifies both total and redox-active bacterial cells from environmental samples. By integrating single-cell phenotypes with community-level metrics, this approach anchors microbiome datasets in direct ecological accounting rather than proxies. These measurements can clarify whether observed changes in community structure represent shifts in abundance, activity, or both, improving inference about microbial responses to stress or environmental change. MATRIX therefore offers an efficient way to incorporate quantitative ecology into systems-microbiology and microbiome studies and to strengthen the link between microbial cellular physiology, community dynamics, and eco-system function. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=125 SRC="FIGDIR/small/712149v1_ufig1.gif" ALT="Figure 1"> View larger version (46K): org.highwire.dtl.DTLVardef@2e5883org.highwire.dtl.DTLVardef@b5412dorg.highwire.dtl.DTLVardef@1c9fbfaorg.highwire.dtl.DTLVardef@1bdde14_HPS_FORMAT_FIGEXP M_FIG C_FIG

9
A widespread gut bacterial lineage distinguished by redox metabolism and phage defense

Noecker, C.; Guo, L.; Date, C.; Rai, N.; Daramy, F.; Ramirez Hernandez, L. A.; Kyaw, T. S.; Trepka, K. R.; Gupta, C. L.; Ha, C. W. Y.; Babdor, J.; Spitzer, M. H.; Turnbaugh, P. J.

2026-04-01 microbiology 10.64898/2026.03.31.715625 medRxiv
Top 0.1%
21.7%
Show abstract

Genomic variation within gut microbial species can have consequences for host health and disease. However, for low abundance species, these variations can be difficult to capture by both culture-dependent and -independent approaches. Here, we focus on the prevalent but low abundance gut Actinomycetota Eggerthella lenta. We developed a selective media for sensitive and specific isolation of E. lenta from human stool. Genomes from 87 new E. lenta isolates were combined with prior high-quality assemblies, shedding light on within-species functional diversity. Phylogenetic analysis revealed a broadly distributed subclade, which we refer to as E. lenta Group B. This lineage was differentiated by its metabolic potential and bacteriophage defense, though mobile elements were shared broadly across the species. Notably, Group B was positively associated with intestinal inflammation in subjects with inflammatory bowel disease. Overall, these results emphasize the importance of bacterial population structure in host-microbiome interactions and provide a framework to study low-abundance gut taxa. HIGHLIGHTSO_LISelective media enables E. lenta isolation and reveals high prevalence in humans C_LIO_LIDiscovery of a distinctive lineage within E. lenta undergoing genome reduction C_LIO_LIE. lenta Group B has altered metabolism, phage defense, and disease associations C_LIO_LIA widespread conjugative plasmid could enable improved genetics C_LI

10
Diversity and stability of the gut microbiome of naked mole-rat (Heterocephalus glaber), the longest-lived rodent

Rakhimov, A.; Yasuda-Yoshihara, N.; Arita, M.; Okumura, K.; Kawamura, Y.; Oka, K.; Mori, H.; Wakabayashi, Y.; Baba, Y.; Baba, H.; Miura, K.

2026-02-17 microbiology 10.64898/2026.02.16.704739 medRxiv
Top 0.1%
19.7%
Show abstract

The naked mole-rat is a subterranean rodent adapted to extreme hypoxia and low metabolic demands, with an exceptionally long lifespan relative to its small body size, while maintaining reproductive capacity. Using 16S rRNA gene sequencing of 24 samples and whole-metagenome sequencing of 11 samples from individuals up to 15 years of age, we characterized the gut microbiota and showed its complexity and distinctiveness compared with that of other rodents, including mice, squirrels, and rabbits. Although all animals were born and raised in a laboratory setting, the gut microbiota remained taxonomically stable across ages and retained key taxa previously reported in wild naked mole-rats (e.g., Treponema and Desulfovibrio). Metagenome-assembled genomes revealed the presence of archaeal methanogens and termite-gut-associated bacteria (e.g., Methanobacteria within Euryarchaeota and Avelusimicrobium within Elusimicrobiota), together with genes involved in hydrogen metabolism and archaeal methanogenesis. Compared with mice, the naked mole-rat gut microbiota was enriched in carbohydrate-active enzymes targeting plant cell-wall polysaccharides, resembling those found in ruminants. We also detected evidence of flagellates, ciliates, and fungi, which may further contribute to polysaccharide degradation and fermentation, potentially within the enlarged cecum. Together, this comprehensive analysis provides distinctive gut microbial features of the naked mole-rat that may be associated with the naked mole-rats low metabolic rate and exceptional longevity.

11
Stage-specific gut microbiome shifts across the Type 2 Diabetes Mellitus spectrum: A systematic review and meta-analysis

Harrass, S.; Ali, S.; Elshweikh, M.; Franco-Duarte, R.; Jayasinghe, T. N.

2026-01-22 endocrinology 10.64898/2026.01.20.25341999 medRxiv
Top 0.1%
18.9%
Show abstract

AimsThe gut microbiome has been implicated in type 2 diabetes progression, but reproducible biomarkers across studies remain limited due to technical and population heterogeneity. This study investigated whether specific gut microbiome shifts occur progressively across stages of type 2 diabetes. MethodsWe systematically reanalysed 16S rRNA datasets from 12 published studies (n=1,247 samples) after quality control, examining five groups (healthy controls, prediabetes (PD), new-onset type 2 diabetes, established type 2 diabetes, and type 2 diabetes with complications. Sequencing reads were quality-filtered, denoised, and resolved into amplicon sequence variants with genus-level taxonomic assignments using the SILVA database. Centered log-ratio (CLR)-transformed abundance data were analysed using PERMANOVA, meta-analysis with leave-one-study-out validation, differential abundance testing (Wilcoxon and ANCOM), and Random Forest classification. Eligible studies were identified through comprehensive searches of PubMed, Ovid Medline and Web of Science from June 2010 - June 2025 using predefined inclusion and exclusion criteria following PRISMA 2020 guidelines. Studies were investigated by two independent reviewers and included if they provided 16S rRNA data on adults across diabetes stages. Study quality was assessed based on metadata completeness and raw data availability. This systematic review and meta-analysis was registered in the Open Science Framework (OSF; registration https://osf.io/eth7a; embargoed until October 2026) and conducted according to PRISMA guidelines. ResultsEarly disease transitions showed minimal microbiome alterations, with only 4 genera, (notably enrichment of Allisonella and Escherichia-Shigella) were significantly different between healthy and PD (q < 0.05), and no significant genera between PD and new-onset type 2 diabetes. Advanced disease exhibited robust dysbiosis, with 9 genera differentially abundant in type 2 diabetes vs complicated type 2 diabetes and 5 genera in healthy vs complicated type 2 diabetes comparisons. Complicated type 2 diabetes was characterised by enrichment of Hungatella and [Clostridium] innocuum group and depletion of Faecalibacterium and compared to both uncomplicated type 2 diabetes and healthy controls. Random Forest classification achieved poor performance for early contrasts (AUC [&le;] 0.79) but strong discrimination for advanced disease (type 2 diabetes vs complicated type 2 diabetes: AUC = 0.89; Healthy vs complicated type 2 diabetes: AUC = 0.96). ConclusionGut microbiome alterations are subtle and inconsistent in early dysglycemia but become pronounced and reproducible with diabetic complications, suggesting microbiome-based biomarkers may be most clinically useful for identifying disease progression rather than early detection. Limitations include heterogeneity of sequencing methods and reliance on 16S rRNA data, which may restrict taxonomic and functional resolution. To our knowledge, this is the first meta-analysis to systematically evaluate gut microbiome alterations across multiple clinical stages of type 2 diabetes progression.

12
SIPdb: A stable isotope probing database and analytical dashboard for linking amplicon sequences to microbial activity using a reverse ecology approach

Trentin, A. B.; Simpson, A.; Kimbrel, J. A.; Blazewicz, S. J.; Wilhelm, R. C.

2026-02-11 bioinformatics 10.64898/2026.02.09.704843 medRxiv
Top 0.1%
18.8%
Show abstract

Stable isotope probing (SIP) provides a powerful means to connect microbial sequence data with diverse metabolic activities, but the lack of a framework for SIP-derived data has limited its integration into broader strategies for ecological inference. Here, we introduce the SIPdb, an extensible SQLite database of curated nucleic acid SIP experiments (also in phyloseq format) paired with an interactive RShiny dashboard for analysis and visualization. The initial release compiles 22 studies covering 21 isotopolog substrates across diverse environments, with data standardized using the MISIP metadata standard. In creating the SIPdb, we have provided a standardized pipeline that accommodates the three most common SIP gradient fractionation strategies (binary, multi-fraction, and density-resolved), two isotope incorporator designation strategies (fixed- and sliding-window), and four complementary differential abundance methods (DESeq2, edgeR, limma-voom, and ALDEx2). Using our pipeline, we identified more than 42,000 unique amplicon sequence variants as isotope incorporators across 62 phyla. Benchmarking with synthetic datasets demonstrated consistent performance across incorporator designation strategies, with ALDEx2 providing the highest specificity. Validation against original publications showed that, on average, SIPdb recovered 70.1% of author-reported incorporator taxa, with discrepancies arising from differences in phylotyping or classification approaches. Finally, our reanalysis of a non-SIP study of 1,4-dioxane degradation showed how SIPdb can both validate known degraders and uncover additional candidate taxa involved in community metabolism. The SIPdb establishes a scalable platform for reverse ecology, enabling hypothesis generation, cross-study meta-analysis, and linking taxa to metabolic processes, while serving as an open, extensible resource to accelerate ecological interpretation in microbiome research.

13
SCiMS: Sex Calling in Metagenomic Sequences

Tran, H. N.; Kirven, K. J.; Davenport, E. R.

2026-02-18 bioinformatics 10.64898/2026.02.17.705110 medRxiv
Top 0.1%
18.4%
Show abstract

BackgroundHost sex is a critical determinant of microbial community structure, influenced by hormonal profiles, physiology, and sex-stratified behaviors. Despite its importance, sex metadata is frequently missing or mislabeled in microbiome studies. Existing genomic sex-calling tools often fail in low-host-biomass samples (e.g., stool) because they require high read depths to achieve reliability. ResultsHere, we present SCiMS (Sex Calling in Metagenomic Sequences), a bioinformatic tool that leverages host-derived DNA within metagenomic datasets to accurately predict host sex, even at low host coverage. SCiMS uses sex-chromosome read density ratios within a Bayesian classifier to provide high-accuracy sex calls. In simulations, SCiMS achieves >85% accuracy with as few as 450 host reads. When applied to 1,339 samples from the Human Microbiome Project, SCiMS outperforms existing tools, showing higher accuracy and more balanced precision-recall tradeoffs across body sites. SCiMS also generalizes effectively to non-human hosts, achieving 100% accuracy in a murine dataset and outperforming alternatives in a chicken dataset with a ZW sex determination system. ConclusionsSCiMS provides an accurate, scalable, and cross-species generalizable solution for host sex classification in metagenomic datasets, even when host DNA is minimal. By enabling the recovery of missing sex metadata, it serves as a quality-control tool for ensuring the integrity of analyses in microbiome research. SCiMS is freely available at http://github.com/davenport-lab/SCiMS.

14
Ruminosignatures associated with methane emissions and feed efficiency across geographies and cattle breeds

Vourlaki, I.-T.; Furman, O.; Tapio, I.; Guan, L. L.; Waters, S. M.; Kenny, D.; Smith, P.; Kirwan, S. F.; Kelly, D.; Evans, R.; Quintanilla, R.; Reverter, A.; Alexandre, P. A.; Li, F.; Garnsworthy, P. C.; Bani, P.; Pope, P. B.; Morgavi, D. P.; Mizrahi, I.; Ramayo-Caldas, Y.

2026-02-19 microbiology 10.64898/2026.02.19.706774 medRxiv
Top 0.1%
18.0%
Show abstract

The cattle rumen microbiota represents a highly complex and dynamic ecosystem, whose organization and connection to host phenotypes are of the highest importance to food security and the environment. In this study, we analyzed the rumen microbiota, from 2,492 cattle belonging to five different breeds and production systems across five countries, categorizing them into microbial co-abundance groups referred to as Ruminosignatures. We identified twelve distinct Ruminosignatures, including two that were consistently observed across all populations and were dominated by the genus Prevotella and UBA2810. Additional Ruminosignatures showed breed-and diet-specific patterns and collectively explained 96-99% of the variance in rumen microbial composition. The abundances of several Ruminosignatures were associated with methane emissions and feed efficiency, and were influenced by host genetics, with heritability estimates ranging from 0.09 to 0.51. The Ruminosignature dominated by UAB2810 was negatively associated with methane emissions across all datasets and positively linked to feed efficiency in Holstein from Italy and crossbred from Ireland. Additionally, the type of production system affects both the occurrence of Ruminosignatures and their impact on host phenotypes, emphasizing the need for context-specific approaches to modulate the rumen microbiome. Overall, our results offer new perspectives on the assembly of ruminal microbes and underscore the potential of the Ruminosignatures framework for microbiome-informed precision agriculture and breeding initiatives aimed at enhancing feed efficiency and minimizing the environmental impact of cattle farming.

15
VAE (Variational Autoencoder) Based Gastrotype Identification and Predictive Diagnosis of Helicobacter pylori Infection

Ma, Z.; Qiao, Y.

2026-04-13 gastroenterology 10.64898/2026.04.11.26350690 medRxiv
Top 0.1%
17.9%
Show abstract

Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.

16
Simplifying Daily Cortisol Cycle Analysis: Validation and Benchmarking of the Cortisol Sine Score Against Cosinor and JTK_CYCLE models

Anza, S.; Rosa, B.; Herzberg, M. P.; Lee, G.; Herzog, E.; Peinan Zhao, P.; England, S. K.; Ndao, M. I.; Martin, J.; Smyser, C. D.; Rogers, C.; Barch, D.; Hoyniak, C. P.; McCarthy, R.; Luby, J.; Warner, B.; Mitreva, M.

2026-02-24 endocrinology 10.64898/2026.02.23.26346831 medRxiv
Top 0.1%
17.7%
Show abstract

The daily cortisol cycle is a critical indicator of hypothalamic-pituitary-adrenal (HPA) axis function. The current analytical approaches produce several outputs difficult to integrate into simple statistical models, clinical workflows, and ML/AI pipelines requiring single-value inputs. We developed the Cortisol Sine Score (CSS), a model-free scalar metric that quantifies daily cortisol exposure by computing a weighted sum of cortisol measurements across the day, using sine-transformed time-of-day weights. The CSS produces positive values for morning-dominant patterns, negative values for evening-shifted profiles, and near-zero values for flattened rhythms characteristic of chronic stress and circadian disruption. We validated the CSS performance in 3,006 samples from 501 pregnant women enrolled in the March of Dimes program, with cortisol values measured at 6 time points per day collected during the second trimester of pregnancy. The CSS showed strong correlations with observed and model-estimated amplitude and acrophase from Cosinor regression and JTK_CYCLE approaches, with excellent classifying performance (AUC=0.89, high versus low). The CSS successfully captured established associations between social disadvantage and cortisol dysregulation, and demonstrated utility in predicting gut microbiome composition in metagenomic analyses. Importantly, the CSS maintains excellent fidelity to the full 6-sample protocol with as few as 3-4 daily measurements. The 4-sample protocol achieves great performance (r = 0.952, MAE = 0.087) while reducing participant burden. The 06:00 time point was identified as essential for accurate CSS quantification. The CSS bridges the gap between circadian analysis and practical implementation by providing a simple, interpretable, and robust assessment of cortisol daily cycle in large-scale epidemiological studies, clinical screening, and biomedical sensors. HighlightsO_LICurrent state-of-the-art approaches estimating the daily cortisol exposures produce multi-output information difficult to implement in simple statistical analyses or ML/AI multi-omics approaches C_LIO_LICortisol Sine Score is a novel model-free scalar metric expressing cortisol daily exposure and rhythmicity (morning vs evening exposure) C_LIO_LICortisol Sine Score was validated using 3006 salivary samples from clinical data and golden standards in circadian analyses such as Cosinor and JTK_CYCLE C_LIO_LICortisol Sine Score was the top performer in our benchmarking approach predicting association with social disadvantage and gut microbiome composition C_LIO_LIReliable with 3-4 daily samples, reducing participant burden C_LIO_LIOpen-source R package CortSineScore democratizes cortisol cycle analysis C_LI

17
Unveiling Hidden Endophytes by Optimising Identification of Endophytic Bacterial Communities from Wild Grassland Plant Roots

Ajaz, S.; Longepierre, M.; Haskins, E.; Kacprzyk, J.; Caruso, T.

2026-02-17 plant biology 10.64898/2026.02.16.706108 medRxiv
Top 0.1%
17.4%
Show abstract

Endophytic bacteria are increasingly recognised for their roles in plant health through symbiosis. However, methodological challenges, such as inconsistent root sterilisation, inefficient microbial DNA extraction, and co-amplification of plant organellar DNA, limit accurate characterisation of these communities, especially in wild grassland plants and non model plant in general. To address this, we developed and tested a streamlined protocol for bacterial endophyte detection from wild grassland plant roots, encompassing surface sterilisation of roots, DNA extraction, clamping of plant internal mitochondrial and chloroplast DNA, and 16S rRNA amplicon sequencing. Our approach minimises plant DNA contamination and yields high-quality microbial profiles. The protocol is adaptable and specific to grassland plant species, offering a standardised foundation for endophyte studies in wild and non-model plants. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=141 HEIGHT=200 SRC="FIGDIR/small/706108v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@157df36org.highwire.dtl.DTLVardef@1ff645aorg.highwire.dtl.DTLVardef@1580ecorg.highwire.dtl.DTLVardef@1c31b89_HPS_FORMAT_FIGEXP M_FIG C_FIG (Haskins and Ajaz, 2026) https://BioRender.com/47gd2xr

18
Calibrating for absolute microbiome abundances without spike-ins

de Wit, N. T.; Baral, A.; Fuschi, A.; Jacobs, G.; de Rijk, S.; van der Plaats, R. Q.; Becsei, A.; Kerkvliet, J.; Freitag, R.; Vojtkova, M.; Brinch, C.; Schmitt, H.; Munk, P.

2026-02-26 microbiology 10.64898/2026.02.26.708180 medRxiv
Top 0.1%
17.3%
Show abstract

Metagenomics is a widely used approach in microbiome research. However, a major limitation of metagenomic datasets is their compositional nature, which prevents direct quantification of absolute abundances and complicates cross-sample comparisons. Existing strategies for absolute quantification typically require additional experiments or spike-in controls. Here, we introduce the MetaGenome Calibrator (MGCalibrator), a new tool that enables spike-in free, absolute abundance estimation based on routine DNA concentration measurements. We validated the accuracy of absolute abundances obtained with MGCalibrator against qPCR for 5 targets. Our results show a strong correlation with qPCR data, indicating that MGCalibrator enables qPCR-like trend analyses. For Bacteroides dorei, the estimated abundances were highly similar between the two methods (r2 = 0.98, y = 1.00x). For other targets like crAssphage or the bacterial 16S rRNA gene, qPCR values were underrepresented by a factor of 7 or overrepresented by a factor of 4. Benchmarking with synthetic microbiome data demonstrated that our method accurately determines copy numbers in sequencing datasets, and application to whole-cell mock community samples produced expected values based on known extraction biases. In an extraction-bias-free experiment, MGCalibrator accurately quantified genome copy numbers within a twofold range in 98% of cases and determined 16S rRNA gene copies within 1.6-fold or less. Finally, we applied MGCalibrator to track temporal trends in antibiotic resistance genes (ARGs) in wastewater treatment plants in two Dutch provincial capitals. We observed an overall increase in ARGs--such as sul2 in Utrecht and qnrS5 in Houtrust--likely driven by rising bacterial loads. Our findings demonstrate that MGCalibrator provides robust calibration of metagenomic data, paving the way for metagenomics to play a central role in future surveillance by enabling trend analysis across thousands of genetic targets, similar to the capabilities of qPCR for individual genes. The source code and documentation for MGCalibrator are available at github.com/NimroddeWit/MGCalibrator.

19
Metagenomic strain-resolved DNA modification patterns link extrachromosomal genetic elements to host strains

Wang, S.; Guitor, A. K.; Valentin-Alvarado, L. E.; Garner, R.; Zhang, P.; Yan, M.; Shi, L.-D.; Schoelmerich, M. C.; Steininger, H. M.; Portik, D. M.; Zhang, S.; Wilkinson, J. E.; Lynch, S.; Morowitz, M. J.; Hess, M.; Diamond, S.; Banfield, J. F.; Sachdeva, R.

2026-03-28 microbiology 10.64898/2026.03.27.714056 medRxiv
Top 0.1%
16.7%
Show abstract

DNA modification is central to microbial defense against extrachromosomal genetic elements (ECEs), consequently ECEs tend to adopt their hosts modification patterns. Shared ECE-host modification patterns enable linking ECEs to their hosts, but modification detection tools are designed for single genomes and are ineffective at metagenome scale. Here, we present MODIFI, software for detecting DNA modifications in metagenomes. MODIFI assumes that each k-mer in a metagenome is mostly unmodified and calculates background signal levels for that k-mer from PacBio HiFi reads, eliminating the need for matched control experiments. MODIFI ECE-host linkages were validated using >1,000 isolate and mock microbiome datasets. Illustrating the approach, we identified 315 strain-resolved, non-redundant ECE-host linkages in environmental and human metagenomes. In infant gut microbiomes, a chromosomal inversion in Enterococcus faecalis alters host and associated plasmid methylation motifs simultaneously. Overall, MODIFI solves a major bottleneck in DNA modification analysis and provides a foundational tool for understanding microbial epigenomics.

20
Resistome and microbiome-immune interactions in an Eastern European population with high antibiotic use

Mirauta, B.; Riza, A.-L.; Streata, I.; Pirvu, A.; Dorobantu, S.; Dragos, A.; Surleac, M.; Netea, M.

2026-01-22 microbiology 10.64898/2026.01.22.700835 medRxiv
Top 0.1%
14.7%
Show abstract

The gut microbiome influences host health, affecting gastrointestinal, metabolic, immune, cardiovascular, and neurological functions. A balanced microbiome is associated with favourable health outcomes. However, excessive antibiotic use and dietary habits can disrupt this ecosystem, leading to dysbiosis and affecting body homeostasis. We present the first comprehensive metagenomic analysis of the gut microbiome in a healthy Romanian cohort. With no prior high-resolution profiling on this population, characterized by high antibiotic consumption, this cohort contributes to understanding microbiome variation in European populations. We report microbiome features consistent with other European populations, including well-defined community configurations, and provide new insights into how these relate to within-phylum diversity. We observe an enrichment of Enterobacteriaceae, a pattern that may be shaped by population-level exposures, including antibiotic use. The analysis of antimicrobial resistance genes, contextualized with data from other cohorts and the European Centre for Disease Prevention and Control, showed an increased prevalence of genes linked to beta-lactams, macrolides, and quinolones--antibiotics commonly used in this population. Finally, we investigate the relationship between the microbial profile and the systemic immune responses, inferred from correlations with in vitro cytokine production. Notably, we identify a potential immune-priming role for Collinsella species and a link between the Prevotella enterotype and the cytokine production capacity. ImportanceThis first comprehensive study of the healthy gut microbiome in a Romanian cohort contributes to a baseline for the microbiome and resistome composition of this population. While universally accepted definitions of "healthy" microbiomes, or baseline resistomes, remain lacking, such data help contextualise future studies and support the monitoring of dynamics. The Enterobacteriaceae abundance suggests a microbiome composition potentially influenced by antimicrobial consumption, a relevant pattern in a region with a high burden of nosocomial infections. In addition, the prevalence of antimicrobial resistance genes, and the concordance with commonly used antibiotics in the community reinforces the need to address antibiotic use in public health strategies. Although links between the gut microbiome and host immunity are not fully understood, our findings are consistent with a role for microbiome composition in immune-related traits. The association between the Prevotella enterotype and cytokine balance may provide a basis for further investigation of enterotype-specific immune characteristics.