Back

Microbiome

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match Microbiome's content profile, based on 139 papers previously published here. The average preprint has a 0.13% match score for this journal, so anything above that is already an above-average fit.

1
MetaGEAR Explorer: Rapid interactive searches and cross-cohort analyses of microbiome gene associations in disease

Rios, E.; Jin, S.; Zhang, C.; Neuhaus, F.; He, X.; Weissenberger, S.; Schirmer, M.

2026-03-31 bioinformatics 10.64898/2026.03.30.715271 medRxiv
Top 0.1%
33.2%
Show abstract

The human gut microbiome has been linked to inflammatory bowel disease (IBD) and colorectal cancer (CRC), yet identifying disease-associated microbial genes across diverse human cohort studies remains challenging due to inconsistent data processing and the high dimensionality of gene-level abundance profiles. Here we present MetaGEAR Explorer, a web platform comprising a user interface and web services for interactive and programmatic gene-centric exploration of >33 million microbial gene families across 9,053 metagenomic samples from 24 IBD, CRC, and healthy cohorts. MetaGEAR Explorer facilitates gene searches against a catalog of non-redundant gene families via nucleotide or amino acid sequence queries (BLAST) and Pfam domain-based searches. For matched gene families, the platform computes disease-stratified prevalence, cross-cohort disease associations, species-level taxonomic stratification, and functional domain annotations. Importantly, users can also explore the genomic context of individual gene families via contig-based co-localization networks derived from metagenomic species pangenome (MSP) assignments and pivot from sequence to domain searches to identify functional homologs. Additionally, the platform features a dedicated catalog to interactively browse 13,795 MSPs and export results programmatically via API endpoints. We demonstrate MetaGEAR Explorers utility using the narG-encoding nitrate reductase gene and a case study of colibactin self-protection genes (clbS and DUF1706 homologs), where the platform revealed a consistent shift from commensals to Gammaproteobacteria carriers in disease. In summary, MetaGEAR Explorer enables rapid cross-cohort functional meta-analyses and is freely available at https://metagear-explorer.schirmerlab.de. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=177 HEIGHT=200 SRC="FIGDIR/small/715271v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): org.highwire.dtl.DTLVardef@ea318dorg.highwire.dtl.DTLVardef@15b497borg.highwire.dtl.DTLVardef@354abcorg.highwire.dtl.DTLVardef@bd7dc5_HPS_FORMAT_FIGEXP M_FIG C_FIG

2
ZeaMiC: a Publicly Available Culture Collection of Maize Root-Associated Bacteria

Garrell, A.-K.; Ginnan, N.; Swift, J. F.; Pal, G.; Zervas, A.; Pestalozzi, C.; Tang, C.; Tso, F.; Ford, N. E.; Niu, B.; Castrillo, G.; Schlaeppi, K.; Hahnke, R. L.; Wagner, M. R.; Kleiner, M.

2026-03-24 microbiology 10.64898/2026.03.23.713778 medRxiv
Top 0.1%
32.0%
Show abstract

Plant-associated microbiota are composed of hundreds of microbial species. For many of them, little is known about their individual functions and even less is known about their emergent community-level traits. While culture-independent methods provide valuable insights into the composition, diversity, and functional potential of plant-associated microbiota, culture-dependent methods are essential for reductionist lines of inquiry into the roles of individual species and their interactions within a community. Here, we present ZeaMiC, a publicly available culture collection of root-associated bacteria from Zea mays (maize). This resource comprises 88 isolates obtained from diverse soils and several maize genotypes, with live cultures available through DSMZ (German Collection of Microorganisms and Cell Cultures) both as single stocks and as cost-effective bundles (https://www.dsmz.de/collection/catalogue/microorganisms/microbiota/zeamic). To maximize relevance, isolates were selected to be representative of maize root-associated microbiomes in the Corn Belt of the United States, based on abundance-occupancy patterns from previously published root microbiome data, phylogenetic diversity, and literature-based evidence of functional importance. Whole-genome sequencing and annotation revealed genes associated with root colonization, plant growth promotion, and nutrient cycling, including functions such as chemotaxis, biofilm formation, secretion systems, hormone modulation, and phosphate solubilization. This collection serves as a community resource for future mechanistic studies of plant-microbe and microbe-microbe interactions, filling the gap in our understanding of the ecological interactions in plant microbiomes.

3
Who Infects Whom? Exploiting Bacterial Minicells for Targeted Virome Enrichment and Phage-Host Interaction Analysis through an Integrated Metagenomic Approach

Pedramfar, A.; Ensenat, E.; Allcock, N. S.; Millard, A. D.; Galyov, E. E.

2026-04-09 microbiology 10.64898/2026.04.08.717211 medRxiv
Top 0.1%
23.3%
Show abstract

Linking bacteriophages (phages) to their hosts remains a fundamental challenge to understanding microbial ecology, viral evolution, and horizontal gene transfer. Although phages are the most abundant biological entities on Earth, the majority of them remain uncharacterized due to the lack of efficient host-linking approaches. Traditional methods, such as plaque assays, have significant limitations as they depend on visible lysis and therefore fail to detect phages that do not form plaques. Conversely, shotgun metagenomics can recover viral genomes directly from environmental samples; however, it cannot directly link phages to their bacterial hosts. In this study, we addressed this limitation by tackling the critical question of "who infects whom?" through the development of a novel, culture-independent approach that utilises an anucleate bacterial minicells-based platform to enrich for phages capable of infecting a target bacterial host. To validate our approach, purified Escherichia coli minicells were exposed to a concentrated viral fraction derived from sewage samples. Genomic DNA from phages that successfully infected and interacted with the E. coli minicells was isolated, amplified, and sequenced. Metagenomic analysis revealed a distinct E. coli-specific virome, including several putatively novel phage species and genera. This platform effectively bridges the gap between culture-dependent and metagenomic methods, providing a scalable, host-targeted tool for identifying phage-host pairs. Our approach also opens new opportunities for studying phage-host interaction networks in complex microbial ecosystems and enhances our ability to investigate viral diversity, host specificity, and the ecological roles of phages in natural environments.

4
Micro16S: Universal Phylogenetic 16S rRNA Gene Representations for Deep Learning of the Microbiome

Bishop, H. V.; Ogilvie, O. J.; Dobson, R. C. J.; Herbold, C. W.

2026-03-24 bioinformatics 10.64898/2026.03.21.713432 medRxiv
Top 0.1%
22.3%
Show abstract

1Existing self-supervised microbiome models represent taxa as discrete, independent units restricted to fixed vocabularies, disregarding their evolutionary context. Here we present Micro16S, a deep learning approach that embeds 16S ribosomal RNA gene sequences into a continuous vector space according to phylogenetic relationships derived from the Genome Taxonomy Database. Using a combination of triplet and pair loss objectives, the model learns representations where spatial proximity reflects phylogenetic relatedness, while remaining largely invariant to the specific 16S rRNA region. Evaluations demonstrate taxonomically coherent clustering across most ranks and substantially improved region invariance compared to k-mer frequency baselines. A transformer pretrained on 50,418 unlabelled gut microbiome samples using these embeddings captured biologically meaningful community structure, though classical machine learning baselines outperformed Micro16S across six benchmark classification tasks, highlighting the limitations of the current system. These results establish the feasibility of phylogenetic embeddings for microbiome deep learning and identify mining algorithm design and class imbalance as primary targets for future improvement.

5
StrataBionn: a neural network supervised classification method for microbial communities

Symons, A. E.; Huynh, A. V.; Cornejo, O. E.

2026-04-02 genomics 10.64898/2026.03.31.715659 medRxiv
Top 0.1%
22.0%
Show abstract

The classification of microbial communities into discrete states or "community state types" (CSTs) is fundamental to understanding host-microbiome interactions and their clinical implications. Traditional methods, such as the nearest-neighbor approaches, often struggle with the inherent noise, high dimensionality, and non-linear signatures of taxonomic profiles. We present a novel supervised framework for microbial community classification, leveraging an Artificial Neural Network (ANN) architecture implemented in a new tool we named StrataBionn. We rigorously evaluated our approach using large-scale vaginal microbiome datasets, directly benchmarking performance against VALENCIA and a Random Forest (RF) classifier. To demonstrate the versatility of our models, we further extended the framework to oral microbiome classification, assessing its stability across diverse anatomical sites. Our supervised models consistently outperformed the nearest-neighbor approach across all evaluated datasets. In the vaginal microbiome, our method achieved an 11.6% to 13.3% increase in performance across all primary metrics, including precision, recall, accuracy, and F1-score. Furthermore, we demonstrate that this performance advantage is maintained in the oral microbiome, highlighting the generalizability of our neural network and ensemble strategies to various microbial ecosystems without the need for niche-specific algorithmic adjustments. By capturing complex feature dependencies that distance-based methods overlook, our approach provides a more robust and accurate census of microbial community structures. StrataBionns ability to learn classification schemes for any microbiome with high accuracy and explainability, through the use of provided utilities to visualize feature-space classification boundaries and perform perturbation analysis on trained classifiers, makes it ideal for broad application in microecology research. This framework offers a scalable, high-performance alternative for microbiome researchers, facilitating more precise clinical stratification and biological insights across hosts body sites.

6
A widespread gut bacterial lineage distinguished by redox metabolism and phage defense

Noecker, C.; Guo, L.; Date, C.; Rai, N.; Daramy, F.; Ramirez Hernandez, L. A.; Kyaw, T. S.; Trepka, K. R.; Gupta, C. L.; Ha, C. W. Y.; Babdor, J.; Spitzer, M. H.; Turnbaugh, P. J.

2026-04-01 microbiology 10.64898/2026.03.31.715625 medRxiv
Top 0.1%
21.7%
Show abstract

Genomic variation within gut microbial species can have consequences for host health and disease. However, for low abundance species, these variations can be difficult to capture by both culture-dependent and -independent approaches. Here, we focus on the prevalent but low abundance gut Actinomycetota Eggerthella lenta. We developed a selective media for sensitive and specific isolation of E. lenta from human stool. Genomes from 87 new E. lenta isolates were combined with prior high-quality assemblies, shedding light on within-species functional diversity. Phylogenetic analysis revealed a broadly distributed subclade, which we refer to as E. lenta Group B. This lineage was differentiated by its metabolic potential and bacteriophage defense, though mobile elements were shared broadly across the species. Notably, Group B was positively associated with intestinal inflammation in subjects with inflammatory bowel disease. Overall, these results emphasize the importance of bacterial population structure in host-microbiome interactions and provide a framework to study low-abundance gut taxa. HIGHLIGHTSO_LISelective media enables E. lenta isolation and reveals high prevalence in humans C_LIO_LIDiscovery of a distinctive lineage within E. lenta undergoing genome reduction C_LIO_LIE. lenta Group B has altered metabolism, phage defense, and disease associations C_LIO_LIA widespread conjugative plasmid could enable improved genetics C_LI

7
VAE (Variational Autoencoder) Based Gastrotype Identification and Predictive Diagnosis of Helicobacter pylori Infection

Ma, Z.; Qiao, Y.

2026-04-13 gastroenterology 10.64898/2026.04.11.26350690 medRxiv
Top 0.1%
17.9%
Show abstract

Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.

8
Metagenomic strain-resolved DNA modification patterns link extrachromosomal genetic elements to host strains

Wang, S.; Guitor, A. K.; Valentin-Alvarado, L. E.; Garner, R.; Zhang, P.; Yan, M.; Shi, L.-D.; Schoelmerich, M. C.; Steininger, H. M.; Portik, D. M.; Zhang, S.; Wilkinson, J. E.; Lynch, S.; Morowitz, M. J.; Hess, M.; Diamond, S.; Banfield, J. F.; Sachdeva, R.

2026-03-28 microbiology 10.64898/2026.03.27.714056 medRxiv
Top 0.1%
16.7%
Show abstract

DNA modification is central to microbial defense against extrachromosomal genetic elements (ECEs), consequently ECEs tend to adopt their hosts modification patterns. Shared ECE-host modification patterns enable linking ECEs to their hosts, but modification detection tools are designed for single genomes and are ineffective at metagenome scale. Here, we present MODIFI, software for detecting DNA modifications in metagenomes. MODIFI assumes that each k-mer in a metagenome is mostly unmodified and calculates background signal levels for that k-mer from PacBio HiFi reads, eliminating the need for matched control experiments. MODIFI ECE-host linkages were validated using >1,000 isolate and mock microbiome datasets. Illustrating the approach, we identified 315 strain-resolved, non-redundant ECE-host linkages in environmental and human metagenomes. In infant gut microbiomes, a chromosomal inversion in Enterococcus faecalis alters host and associated plasmid methylation motifs simultaneously. Overall, MODIFI solves a major bottleneck in DNA modification analysis and provides a foundational tool for understanding microbial epigenomics.

9
MAAMOUL: Metabolic network-based discovery of microbiome-metabolome shifts in disease

Muller, E.; Baum, S.; Borenstein, E.

2026-03-30 bioinformatics 10.64898/2026.03.27.714614 medRxiv
Top 0.1%
14.5%
Show abstract

MotivationA central goal in human gut microbiome research is to identify disease-associated functional shifts, an objective increasingly pursued through metagenomic and metabolomic assays. However, common differential abundance analyses of genes or metabolites often yield long and difficult-to-interpret feature lists. Aggregating features into predefined pathways can improve interpretability but relies on fixed pathway boundaries that may not reflect context-specific functional changes. Moreover, even when paired metagenomic-metabolomic data are available, they are often analyzed separately or linked only through simple statistical associations. ResultsWe introduce MAAMOUL, a knowledge-based computational framework that integrates metagenomic and metabolomic data to identify disease-associated, data-driven microbial metabolic modules. Leveraging prior knowledge of bacterial metabolism, MAAMOUL maps disease-association scores onto a global microbiome-wide metabolic network and identifies custom modules enriched for altered genes and metabolites. Applying MAAMOUL to inflammatory bowel disease (IBD) and irritable bowel syndrome (IBS) datasets revealed significant disease-associated modules not detected by conventional pathway-level analysis. In IBD, modules reflected disrupted sulfur and aromatic amino acid metabolism and enhanced microbial nucleotide salvage, whereas in IBS they linked purine and nicotinate/nicotinamide metabolism. These results demonstrate that network-guided multi-omic integration can uncover coherent functional shifts in the gut microbiome overlooked by single-omic or purely statistical approaches. AvailabilityMAAMOUL is available as an R package at https://github.com/borenstein-lab/MAAMOUL.

10
Hawaiian Geothermal Fumaroles Contain Diverse and Novel Viruses

Sen, P.; Oliver, L.; Makarova, K. S.; Wolf, Y. I.; Pavloudi, C.; Shlafstein, M.; Saw, J. H.

2026-04-07 microbiology 10.64898/2026.04.06.716669 medRxiv
Top 0.1%
14.1%
Show abstract

Microbial communities of geothermal habitats are central to understanding the evolution of life on Earth. Metagenomics has provided insight into the role of viruses in shaping microbial diversity of complex environments. However, identification of novel viruses is constrained by lack of marker genes and low nucleotide similarities between related viral taxa. While microbial and viral diversity have been explored in terrestrial hot springs and hydrothermal vent systems, other volcanic features remain underexplored. Fumaroles (steam vents) are geothermal features that heat groundwater with magma, releasing steam and volcanic gases such as CO2 and H2S. Comparatively physicochemically dynamic to hot springs, fumarole temperatures and gas emissions rapidly fluctuate with volcanic activity. Here, we describe viruses identified metagenomically from microbial mats hosted near basaltic fumaroles on the Big Island of Hawai`i. To our knowledge, this is the first systematic survey of fumarole viruses. Our utilization of a sensitive profile-based approach for identification reveals high viral diversity in fumaroles, resulting in estimation of two undescribed order-level clades of Caudoviricetes (tailed phages). Viral metabolic genes provide evidence of viral-mediated adaptation of microbes to fumarole conditions. We describe patterns of viral diversity that diverge from the Bank model of viral ecology, hinting at viral dispersal between biofilms and high viral richness and evenness. Lastly, we provide a description of the first terrestrial geothermal environment dominated by Microviridae, previously only described in viral communities of deep ocean hydrothermal vents. This study offers important findings for exploration of viral ecology in extreme environments.

11
Characterization of the bacterial microbiome associated with centrohelid heliozoans from aquatic environments using full-length 16S rRNA PacBio sequencing

Gerasimova, E. A.; Balkin, A. S.; Sozonov, G. A.; Chagan, T. A.; Kaleeva, E. I.; Kasseinov, R.; Poshvina, D. V.

2026-03-20 microbiology 10.64898/2026.03.19.712920 medRxiv
Top 0.1%
14.1%
Show abstract

Centrohelid heliozoans are a monophyletic group of free-living, ubiquitous, predatory protists widely distributed in aquatic and soil ecosystems. Centrohelids are known as cytotrophic protists that feed on bacteria, algae, and small unicellular eukaryotes. While algal and chloroplast symbioses have been documented in this group, their bacterial associations remain largely unexplored. In this study, we characterize the bacterial communities associated with centrohelids isolated from freshwater habitats using full-length 16S rRNA PacBio sequencing. Amplicon sequencing revealed 5 phyla, 6 classes, and 58 genera in the bacterial communities associated with seven centrohelid isolates. Alphaproteobacteria, Bacteroidia, and Gammaproteobacteria were the most abundant classes, while Arcicella, Sphingobium, Pseudomonas, Sphingomonas, Azospirillum, Shinella, Flavobacterium, Variovorax, and Rhodococcus were the most abundant genera. Notably, Arcicella, Variovorax, Sphingobium, and Pseudomonas constituted the core microbiome. Unexpectedly, we detected bacteria known as opportunistic pathogens, providing the first evidence that centrohelids may serve as environmental reservoirs for bacteria with pathogenic potential (e.g., Acidovorax, Acinetobacter, Anaerococcus, Bosea, Corynebacterium, Escherichia, Moraxella, Mycobacterium, Prevotella, Pseudomonas, Ralstonia, and Sphingomonas). In addition, this study provides the first evidence of Rickettsiaceae associations with centrohelids. IMPORTANCEThis study reveals that centrohelid heliozoans, ubiquitous microbial predators, harbor diverse and host-specific bacterial communities. Critically, we show they can serve as environmental reservoirs for bacteria with pathogenic potential, a role previously overlooked outside of model protist groups. These findings expand our understanding of pathogen ecology, suggesting that a wider range of protists may contribute to the persistence and dispersal of opportunistic pathogens in aquatic ecosystems.

12
VicMAG, an open-source tool for visualizing circular metagenome-assembled genomes highlighting bacterial virulence and antimicrobial resistance

Tsuda, Y.; Tanizawa, Y.; Vu, T. M. H.; Nishimura, Y.; Shintani, M.; Abe, H.; Hasebe, F.; Kasuga, I.; Nagao, M.; Suzuki, M.

2026-04-01 bioinformatics 10.64898/2026.03.31.714378 medRxiv
Top 0.1%
14.0%
Show abstract

Bacterial pathogens spread in clinical and environmental settings, and mobile genetic elements (MGEs), such as plasmids and phages, mediate the transfer of virulence factor genes (VFGs) and antimicrobial resistance genes (ARGs) among bacterial communities. Metagenomic analysis of environmental and wastewater samples using highly accurate long-read sequencing technologies, such as PacBio HiFi sequencing, provides valuable insights into monitoring the regional spread of VFGs and ARGs, including dissemination mediated by MGEs. No visualization tool is currently available for the comprehensive display of numerous resulting circular metagenome-assembled genomes (cMAGs) with functional gene annotations. Here, we developed VicMAG, a visualization tool for highly complex cMAGs derived from long-read metagenome assemblies annotated using updated databases of VFGs, ARGs, and MGEs. Using 353 cMAGs from PacBio HiFi sequencing of a wastewater sample, we demonstrated the utility of VicMAG for metagenome visualization. VicMAG provides comprehensive, size-aware visualization of cMAGs representing bacterial chromosomes and plasmids, annotated with VFGs, ARGs, and phages. By simultaneously visualizing all cMAGs in a framework, VicMAG facilitates a holistic understanding of the distribution and genomic context of VFGs and ARGs across complex microbial communities. This tool supports integrated surveillance of bacteria associated with virulence and antimicrobial resistance across clinical, environmental, and One Health contexts.

13
An AI-Driven Decision-Support Tool for Triage of COVID-19 Patients Using Respiratory Microbiome Data

Avina-Bravo, E. G.; Garcia-Lorenzo, I.; Alfaro-Ponce, M.; Breton-Deval, L.

2026-03-19 bioinformatics 10.64898/2026.03.18.712739 medRxiv
Top 0.1%
14.0%
Show abstract

Accurate clinical triage is critical for optimizing decision-making and resource allocation during infectious disease outbreaks such as COVID-19. In this study, we present an AI-driven decision-support tool for the triage of COVID-19 patients based on respiratory microbiome profiles derived from shotgun metagenomic sequencing. We analyzed 477 shotgun respiratory metagenomes from three independent public cohorts and generated genus-level taxonomic profiles, which were integrated with minimal clinical metadata to train supervised machine-learning models, including Random Forest, Support Vector Machine, and XGBoost. Model performance was evaluated using standard classification metrics, cross-validation, and particle swarm optimization for hyperparameter tuning. Across cohorts, we observed a consistent transition from microbiomes dominated by commensal taxa to dysbiotic states enriched in opportunistic and clinically relevant genera, particularly Acinetobacter and Staphylococcus, in severe and deceased patients. Among the evaluated models, XGBoost consistently achieved the best performance, reaching up to 96.1% accuracy, 97.6% F1-score, and 98.2% ROC-AUC in individual cohorts. When trained on the integrated dataset, XGBoost maintained robust performance (95.1% accuracy, 97.2% F1-score, 94.3% ROC-AUC) and demonstrated greater stability and lower variance compared to alternative models. Feature-importance analyses identified a compact and interpretable set of recurrent microbial predictors, and reduced-feature models retained substantial discriminative power when augmented with key clinical variables. These results support the respiratory microbiome as a valuable source of information for outcome-oriented clinical triage and position microbiome-informed machine learning as a scalable and interpretable decision-support approach for managing COVID-19 and future infectious disease scenarios.

14
KuafuPrimer: Machine learning empowers the design of 16S amplicon sequencing primers toward minimal bias for bacterial communities

Zhang, H.; Jiang, X.; Yu, X.; Wang, H.; Lu, P.; Hou, J.; Guo, Q.; Xiao, T.; Wu, S.; Yin, H.; Geng, P. X.; Guo, J.; Jousset, A.; Wei, Z.; Xiao, Y.; Zhu, H.

2026-03-31 bioinformatics 10.64898/2026.03.29.714677 medRxiv
Top 0.1%
13.8%
Show abstract

Amplicon sequencing protocol targeting the 16S rRNA gene is a widely used and cost-effective method for exploring bacterial communities. However, its performance is often limited by primer bias arising from the arbitrary use of universal primers across diverse microbial communities and habitats. We propose KuafuPrimer to design the optimal 16S rRNA gene primers toward minimal bias for targeted bacterial communities, using few-shot machine learning to guide the primer design procedure based on a small number of samples. Simulations on 809 samples across 26 representative environments and habitats showed that KuafuPrimer-designed primers outperformed the universal primers in taxonomic accuracy, achieving an averaged 16.31% relative reduction in primer bias, with reductions up to 46.08% in plant samples. Notably, KuafuPrimer detected 29 rare and key taxa undetectable by the universal primers. Validation with 317 longitudinal gut microbiota samples demonstrated that KuafuPrimer-designed primers consistently outperformed the universal primers across temporal, individual, and cohort levels, with relative bias reductions of 5.03%, 3.53%, and 3.10%, respectively. Finally, in real PCR experiments on human gut samples from Clostridioides difficile-infected and healthy groups showed that polymerase chain reaction products using KuafuPrimer-designed primers correlated better with metagenomic data compared to the universal primers. More importantly, KuafuPrimer successfully detected Clostridioides difficile, the key pathogen missed by the universal primers, highlighting its potential for improving clinical diagnostics. In summary, KuafuPrimer provides a machine learning-based primer design strategy for targeted bacterial communities, with demonstrated utility in large-scale microbiome initiatives, longitudinal surveys and clinical diagnostics.

15
Using Hi-C and target capture to monitor plasmid transfer in the barley rhizosphere

Castaneda-Barba, S.; Stalder, T.; Top, E. M.

2026-03-23 microbiology 10.64898/2026.03.20.713245 medRxiv
Top 0.2%
10.2%
Show abstract

Emergence of multi-drug resistant (MDR) pathogens is facilitated by the mobilization of resistance genes from bacteria in animal and environmental habitats, a process often mediated by plasmids. While fertilization of agricultural soils with manure is hypothesized to serve as a pathway for transferring antimicrobial resistance plasmids to soil and crop bacteria, evidence is limited. In this study, we aimed to determine whether MDR-plasmids from manure transfer in soil, leading to the formation of long-term agricultural resistance reservoirs. To this end, we introduced a known MDR plasmid to agricultural soil where barley was subsequently grown and monitored spread of the plasmid over the course of a growing season (up to 190 days). Our experimental design mimicked conventional agricultural practices at a microcosm scale. A digital droplet PCR approach indicated plasmid transfer in the rhizosphere, which was confirmed by a targeted Hi-C method (termed Hi-C+). This demonstrated transfer of the plasmid to soil bacteria 10 days after barley planting but was not observed afterwards. The new plasmid hosts could not be identified, as plasmid-associated host Hi-C reads were absent from existing databases. This implies these hosts were rare and likely unculturable members of the soil microbiome. Our findings demonstrate that plasmid transfer from manure to soil can occur under conditions reflecting those found in agricultural settings. Furthermore, rare and uncharacterized members of the soil microbiomes may participate in acquiring MDR plasmids from manure bacteria, raising important questions about their role in spreading resistance plasmids.

16
Viral isolation reveals novel and diverse phages infecting natural stream biofilms

Chin, W. H.; Boutroux, M.; Harding, A.; Demurtas, D.; Baier, F.; Peter, H.

2026-03-26 microbiology 10.64898/2026.03.26.713887 medRxiv
Top 0.2%
10.0%
Show abstract

Bacteriophages of environmental bacteria remain underrepresented, lending paucity to phage-biofilm research beyond clinical and model species domains. Here, we present the Alpine Lotic Phage (ALP) collection, curated through an isolation campaign from biofilm-forming bacteria of alpine streams. We obtained 57 phage isolates, which were dereplicated to 28 unique genomes following sequencing. The collection consists of tailed phages infecting 14 bacterial host species with genomes spanning 37 to 363 kb while exhibiting diverse plaque morphologies, depolymerase activity, and distinct impacts on host biofilm architecture. Comparative analyses against public viral genomes and a curated planetary-scale contig database revealed limited sequence similarity, underscoring the novelty of ALP phages. Functional annotation resolved 9 - 54% of predicted genes which encoded viral structural components, nucleotide metabolism functions, anti-defence mechanisms, and auxiliary genes that facilitate viral infection and replication. Together, the ALP collection represents a foundational resource for investigating phage evolution and ecology in natural bacterial communities.

17
Systematic detection of abnormal samples reveals widespread mislabeling in metagenomic studies

Ye, W.; Zhou, Y.; Chen, J.; Wanxin, L.; Du, S.

2026-03-25 microbiology 10.64898/2026.03.22.713545 medRxiv
Top 0.2%
10.0%
Show abstract

The human microbiome plays a critical role in health and disease, and its dynamic nature has made longitudinal sampling a key strategy for elucidating microbiome-disease relationships. Although the gut microbiome generally stabilizes over time, a subset of samples frequently shows marked deviations from an individuals baseline profile. We refer to these as abnormal samples. To analyze these abnormal samples, we developed a three-stage workflow to identify and classify these abnormal samples to figure out the underlying causes of these abnormal samples. Moreover, we systematically investigated abnormal samples across 16 publicly available metagenomic datasets, comprising a total of 5,171 metagenomes. Our analysis revealed that abnormal samples are often the result of mislabeling during sample collection, processing, or sequencing. Of which, fecal samples from family are more likely mislabeled. We found evidence of mislabeling in 75% of longitudinal datasets, involving up to dozens of samples per study, and in 25% of randomly selected cross-sectional datasets. Additional factors such as disease status (e.g., inflammatory bowel disease), sampling intervals, and sampling density may also contribute to sample abnormalities owing to true biological variations. These findings highlight that mislabeling is a common yet underrecognized issue in microbiome research. Our work underscores the importance of identifying and correcting abnormal samples to ensure data integrity in microbiome studies and provides a practical solution for quality control in large-scale metagenomic datasets.

18
Personalized microbiotas (counter-)select for antibiotic resistant strains

Knopp, M.; Garcia-Santamarina, S.; Michel, L.; Papagiannidis, D.; David, S.; Selegato, D. M.; Wong, J. L. C.; Karcher, N.; Frankel, G.; Zimmermann, M.; Savitski, M.; Typas, A.

2026-03-30 microbiology 10.64898/2026.03.29.715108 medRxiv
Top 0.2%
9.8%
Show abstract

Antibiotic resistant pathogens are an increasing public health threat, as development of novel therapeutics is outpaced by resistance emergence and dissemination. Approaches to slow down or even revert antibiotic resistance are necessary to maintain efficacy of both existing and new antibiotics. Such approaches exploit the fitness cost of resistance elements, but have largely relied on assessing this cost in laboratory conditions that poorly reflect the native context in which pathogens reside. Here we present a method that allows to investigate the influence of personalized human gut microbiota compositions on the competitive fitness of antibiotic resistant pathogens. Using fecal matter-derived microbiomes we identify a specific community that selects for a carbapenem-resistant Klebsiella pneumoniae strain. This selective advantage is due to mutations arising in a LacI-type transcriptional regulator, GlyR. We show that upregulation of the downstream glycoporin GlyP is causing the effect. By deconvoluting the microbiome composition, we identify a focal E. coli strain as a central driver of the selection, which is further modulated by other microbiota members. We demonstrate that the selective advantage is due to carbohydrate competition, and in particular for glycerol-containing compounds. Importantly, glyR mutations are under strong positive but conditional selection in clinical K. pneumoniae isolates. This implies a reduced competitiveness in other environments, which we experimentally validate in vitro. Overall, this study offers a path to identify microbiome-specific interactions that modulate the competitiveness of antibiotic resistant pathogens.

19
LOCOM2: Robust Differential Abundance Analysis for Microbiome Data

He, M.; Satten, G. A.; Hu, Y.-J.

2026-04-09 bioinformatics 10.64898/2026.04.07.716976 medRxiv
Top 0.3%
9.1%
Show abstract

BackgroundNumerous methods have been developed for differential abundance analysis of microbiome data; however, many fail to adequately control error rates, contributing to the reproducibility crisis in microbiome research. Moreover, new challenges have emerged, including large-scale studies, differential library size distributions, unbalanced case-control designs, and the increasing availability of only relative-abundance data rather than read counts. MethodsWe propose LOCOM2 to address these challenges. The method refines the weighting scheme in LOCOM to eliminate confounding by library size while accommodating relative abundance data. It incorporates a series of adjustments to ensure stable and reliable estimation, even under extreme conditions such as very rare taxa and highly unbalanced case-control designs. In addition, LOCOM2 replaces the computationally intensive permutation procedure in LOCOM with a Wald-type test, substantially improving computational efficiency. To evaluate performance, we conducted extensive simulation studies using the MIDASim simulator and three data templates representing diverse body sites. We benchmarked LOCOM2 against state-of-the-art methods, including LOCOM, LinDA, ANCOM-BC2, MaAsLin2, and MaAsLin3. This benchmarking effort provides an essential foundation for the next stage of microbiome research. ResultsLOCOM2 achieved accurate control of the false discovery rate across all simulation scenarios, whereas none of the other methods consistently did so. LOCOM2 also demonstrated the highest sensitivity for detecting true signals. Applications of these methods to three real microbiome datasets further corroborated these findings.

20
Cervicovaginal Dysbiosis in HPV-Negative Women: Metagenomic Evidence Implicates Achromobacter in Female Infertility

Ali, H.; Sujan, M. S. I.; Nahar, K.; Ahmed, M. F.; Azmuda, N.; Akter, S.; Adnan, N.

2026-03-25 microbiology 10.64898/2026.03.23.713732 medRxiv
Top 0.3%
9.0%
Show abstract

The cervicovaginal microbiome is pivotal to reproductive health, yet its dynamics in HPV-negative women with gynaecological disorders remain underexplored. We investigated microbial diversity and taxonomic shifts in HPV-negative women from Bangladesh using 16S rRNA gene sequencing and shotgun metagenomics. Of 224 women screened, 136 were HPV-negative; 29 underwent 16S profiling, and three infertility-associated cases were further analyzed by shotgun metagenomics. Healthy controls exhibited low alpha diversity and a Lactobacillus-dominated profile (98.2%), reflecting ecological stability. In contrast, pathological cases displayed significantly elevated richness and evenness, reduced Lactobacillus (28.0%), and enrichment of anaerobic and opportunistic taxa, including Bifidobacterium (23.4%), Achromobacter (12.9%) and Sneathia (7.5%). Distinct microbial signatures emerged across clinical subgroups: pelvic inflammatory disease was enriched in Bifidobacterium, intra-menstrual bleeding retained moderate Lactobacillus, while infertility exhibited prominent dominance of Achromobacter (45.5%). Shotgun metagenomics confirmed Achromobacter spp. (A. ruhlandii, A. dolens, A. xylosoxidans) as the predominant taxa (84.9%) in infertility cases, accompanied by depletion of protective Lactobacillus. Functional inference revealed conserved metabolic backbones but disease-specific enrichment of stress-response and biosynthetic pathways, particularly in infertility and PID. Co-occurrence network analysis identified condition-specific microbial consortia, with Achromobacter forming infertility-associated clusters. This study represents the first integrated application of amplicon and shotgun metagenomic approaches to profile the cervicovaginal microbiota in HPV-negative women. It identifies Achromobacter as a potential microbial biomarker of infertility and highlights the urgent need for microbiome-informed diagnostics and targeted interventions to restore cervicovaginal homeostasis.