Back

Microbiome

Springer Science and Business Media LLC

Preprints posted in the last 7 days, ranked by how well they match Microbiome's content profile, based on 139 papers previously published here. The average preprint has a 0.13% match score for this journal, so anything above that is already an above-average fit.

1
Cleaner Air for Lower Cardiometabolic Risk: protocol for a double-blind, randomized, sham-controlled trial of HEPA filtration in adults with prediabetes.

Wittkopp, S.; Asachi, P.; Kazatsker, F.; Aleman, J. O.; Gordon, T.; Brook, R.; Thorpe, L.; Newman, J. D.

2026-06-01 endocrinology 10.64898/2026.05.29.26354420 medRxiv
Top 2%
1.2%
Show abstract

Introduction Air pollution is a leading driver of cardiovascular disease with a growing body of literature implicating this in worse glucose homeostasis. Increases in fine particulate matter air pollution (PM2.5) are associated with increased blood glucose and hemoglobin A1c across the glycemic spectrum from normoglycemia to prediabetes to all forms of diabetes. Despite strong evidence for positive associations of PM2.5 with dysglycemia, it remains unknown if reducing air pollution exposure through air filtration can effect improvements in glucose. This study aims to test the hypothesis that short-term, in-home air pollution reduction using high efficiency particulate air (HEPA) filtration will improve blood sugar in adults with prediabetes. Methods and analysis This trial is a randomized, double-blind, sham-controlled trial of the effects of lowering air pollution exposure using HEPA filtration on cardiometabolic health in adults with prediabetes living in the New York City area. Participants will be randomly assigned to use bedroom air cleaners, or sham air cleaners, while measuring PM2.5 continuously for 1 month. The primary outcomes will be continuous glucose monitoring metrics measured before and after HEPA air filtration. Exploratory outcomes will include insulin resistance measures, serum biomarkers and transcriptomics measured before and after HEPA intervention. We will quantify effects of HEPA filtration with models using treatment arm (true versus sham filtration) as the independent variable. Secondary analyses will model continuous measures of PM2.5 as the independent variable. Ethics and Dissemination This study has undergone peer review; and the work was supported by Grant 2023-0214 from the Doris Duke Foundation, who had no other role in study design or implementation. The study was registered in ClinicalTrials.gov (NCT05994937) prior to recruitment. Clinical Trials Clinical Trials NCT05994937; https://clinicaltrials.gov/study/NCT05994937

2
Cation Enrichment and Hypersialylation in Chronic Rhinosinusitis Mucus

Wood, A. M.; Detwiler, R. E.; Coughlin, M.; Pollard, C. E.; Alt, J. A.; Pulsipher, A.; Kramer Stratton, J.

2026-05-27 otolaryngology 10.64898/2026.05.23.26353957 medRxiv
Top 2%
1.2%
Show abstract

Background: Chronic rhinosinusitis (CRS) is a heterogeneous inflammatory airway disease associated with impaired mucociliary clearance and persistent inflammation. While prior work has focused on inflammatory and molecular pathways, the physicochemical properties of mucus itself remain poorly characterized. This study aimed to define compositional and biophysical features of CRS mucus that may contribute to dysfunction. Methods: A prospective cross-sectional study was conducted in 15 adults undergoing endoscopic sinus surgery (11 CRS, 4 controls). Mucus was collected from the middle meatus. Hydration was measured by lyophilization. Ionic composition was quantified using mass spectrometry. Viscoelasticity was assessed via oscillatory shear rheology. Total protein, total carbohydrate, sialic acid (Sia) and fucose (Fuc) content were quantified using enzymatic and chemical assays. Statistical comparisons were performed using nonparametric tests. Results: CRS mucus exhibited significantly higher Ca2+; and Mg2+; concentrations (approximately two-fold; p<0.05) and increased variability in hydration and ion content compared to controls. Rheology showed greater heterogeneity and a non-significant trend toward increased viscoelasticity in CRS. Total protein and carbohydrate content were not significantly different; however, the carbohydrate-to-protein ratio was significantly reduced in CRS (p=0.04). Sia content and Sia-to-carbohydrate ratio were significantly elevated in CRS (p=0.04 and p=0.002), particularly in CRS with nasal polyps. Fuc content did not differ between groups. Conclusions: CRS mucus demonstrates coordinated alterations in ionic composition and glycosylation, characterized by increased cation content, hypersialylation, and reduced carbohydrate-to-protein ratios. These changes may contribute to altered mucus properties and impaired mucociliary clearance, highlighting mucus composition as a potential therapeutic target in CRS.

3
Prevotella stercorea links gut microbiome ecology to respiratory infection protection through a host-context-dependent, species-autonomous pathway

Ofordile, O. N.

2026-05-30 infectious diseases 10.64898/2026.05.26.26354151 medRxiv
Top 3%
0.7%
Show abstract

Using a longitudinal cohort of 633 Gambian children (IHAT-GUT, NCT02941081), we resolve two mechanistically distinct ecological pathways linking Prevotella stercorea to infection risk. Its abundance positively predicts gut microbiome richness, consistent with community-level colonisation resistance for enteric outcomes. However, its association with reduced acute respiratory infection (ARI) persists unchanged after richness adjustment, identifying a species-autonomous pathway independent of community diversity. Weight-for-age z-score (WAZ) is uncorrelated with microbiome richness within strata, supporting WAZ as a proxy for host immune-metabolic reserve rather than a determinant of microbiome composition. In Low-WAZ children, P. stercorea at Day 1 associates with suppressed CRP, whereas in higher-WAZ children, elevated Day 1 inflammation predicts subsequent P. stercorea colonisation at Day 85, consistent with host-context-dependent immune selection. ARI and fever protection is richness-independent and concentrated in Low-WAZ children. P. copri does not retain an independent protective association when modelled jointly. These findings have direct implications for microbiome-directed interventions.

4
Inferring Sexual Network Bridging Using Genomics: A Simulation Study

Kline, M. C.; Helekal, D.; Oliveira Roster, K. I.; Grad, Y.

2026-05-26 infectious diseases 10.64898/2026.05.24.26353967 medRxiv
Top 5%
0.3%
Show abstract

The dynamics of sexually transmitted infections involve interconnected transmission networks, including men who have sex with men and heterosexual populations. Understanding the extent of bridging between these networks can inform surveillance, guide interventions, and aid in the interpretation of their impact, but methods for quantifying bridging have been lacking. Here, we addressed whether pathogen genomics tools, successfully used to reconstruct transmission in other contexts, could accurately infer sexual network bridging. Based on simulations of gonorrhea spread, we evaluated phylodynamic bridging metrics inferred by ancestral state reconstruction under a range of sampling schemes, from comprehensive to sparse. These metrics differentiated sexual network structures even with biased sampling schemes, but accuracy depended on the sampling scheme and density: phylodynamic bridging estimates using sequences from all detected infections for one network configuration were on average 6.9% above the true value, whereas estimates from 5% of infections in symptomatic men with many partners were on average >1000% above the true value. These results suggest routine overestimation of bridging from unadjusted inferences from genomics data and provide context for interpreting existing genomic surveillance data and targeted studies.

5
Future Pandemics: AI-Designed Diagnostic Assays for Detection of Andes Orthohantavirus (ANDV) Associated with the 2026 MV Hondius Outbreak

MacSharry, J.; Tonda, A.; Lopez-Rincon, A.

2026-05-27 health informatics 10.64898/2026.05.26.26354101 medRxiv
Top 6%
0.2%
Show abstract

Andes orthohantavirus (ANDV), the primary etiological agent of hantavirus pulmonary syndrome (HPS) in South America, is uniquely capable of limited human-to-human transmission, posing a significant challenge for outbreak control. Recent events, including the 2018-2019 Epuyen outbreak and the 2026 MV Hondius incident, underscore the need for rapid, lineage-specific molecular diagnostics. In this study, we present an artificial intelligence (AI)-driven framework for the design of diagnostic primers targeting the S genomic segment of the Epuyen lineage. Using an evolutionary algorithm integrated with thermodynamic evaluation via Primer3Plus, candidate primers were optimized to maximize classification accuracy while satisfying stringent biochemical constraints. The resulting primer set enables amplification of lineage-specific regions suitable for molecular characterization and surveillance. In silico validation demonstrates that the proposed primers achieve perfect discrimination between 2026 outbreak sequences and other ANDV variants. Furthermore, in silico comparison with standard protocol-based primers reveals substantially reduced sensitivity and specificity in the latter, highlighting the limitations of static diagnostic designs when applied to evolving viral populations. Overall, this work demonstrates that AI-assisted primer design provides a robust and adaptable strategy to improve viral detection, enhance outbreak tracking, and support timely public health interventions. Integrating computational optimization into diagnostic development is essential for strengthening preparedness against emerging zoonotic threats.

6
Multiple, but not isolated, yellow fever virus-associated orthoflavivirus immune histories drive antibody-dependent enhancement of Zika and dengue viruses

Gallon, S.; Baffour Tonto, P.; Ding, Y.; Chen, G.-H.; Naito-Keoho, K.; Brites, C.; Netto, E. M.; Wang, W.-K.; Herrera, B. B.

2026-06-01 infectious diseases 10.64898/2026.05.22.26353817 medRxiv
Top 6%
0.2%
Show abstract

Antibody-dependent enhancement (ADE) is a major concern across orthoflavivirus infections, yet how multiple viral exposures shape enhancement risk remains incompletely understood. Here, we integrated serosurveillance from Saude, Brazil with functional immunologic analyses to define how yellow fever virus (YFV)-associated orthoflavivirus immune histories influence ADE phenotypes. Using serocomplex-specific anti-premembrane antibody profiling validated by microneutralization assays, plasma samples were stratified into YFV-only, YFV+DENV, and YFV+DENV+ZIKV exposure groups. In Fc gamma receptor-bearing U937 cells, YFV-only plasma demonstrated minimal enhancement activity, whereas cumulative orthoflavivirus exposure generated broader ADE phenotypes across heterologous viruses. In IFNAR1-/- passive-transfer models, YFV-only plasma did not enhance ZIKV or DENV2 infection in vivo. In contrast, YFV+DENV plasma increased ZIKV viremia and accelerated mortality kinetics, while YFV+DENV+ZIKV plasma demonstrated concentration-dependent enhancement phenotypes. Collectively, these findings indicate that isolated YFV immunity does not predispose to ADE, whereas cumulative orthoflavivirus exposure generates antibody repertoires capable of producing concentration-dependent enhancement in vivo.

7
Field-ready portable rapid nucleic acid test for tuberculosis detection and drug-resistance profiling in resource-limited settings

Nag, S.; Banerjee, S.; Banerjee, S.; Ghosh, S.; Bera, A.; Shanmugam, S.; Mondal, A.; Chakraborty, S.

2026-06-01 infectious diseases 10.64898/2026.05.29.26354438 medRxiv
Top 6%
0.2%
Show abstract

Tuberculosis (TB) remains one of the deadliest infectious diseases, with over a million deaths annually and a growing threat from multidrug-resistant strains (MDR-TB). A major bottleneck in controlling TB is the lack of truly portable, rapid, and user-friendly diagnostic systems that can operate effectively in decentralized, resource-constrained settings. Here, we present a first-of-its-kind, portable nucleic-acid-based diagnostic platform that enables both primary TB screening and detection of drug resistance within the same unified framework, without any change in the operative embodiment. The system integrates loop-mediated isothermal amplification (LAMP) targeting dual Mycobacterium tuberculosis markers (IS6110 and IS1081) with a compact, AI-enabled device and smartphone-based readout, delivering rapid and reliable results at the point-of-care. Clinical evaluation across 105 samples demonstrated high sensitivity and specificity. Further validation through real-world deployment in a primary healthcare setting, using a single-gene (IS6110) configuration operated by minimally trained personnel, yielded 95.60% sensitivity and 100% specificity, benchmarked against GeneXpert. Critically, the same platform architecture, without modification, extends seamlessly to drug-resistance profiling, demonstrated here through a probe-free, allele-specific LAMP approach for identifying key mutations associated with rifampicin (rpoB) and isoniazid (katG) resistance. By combining robust molecular diagnostics with AI-driven automation in a compact and accessible format, this work represents a significant medical advancement toward democratizing TB care. The platform thus holds strong potential to enable early screening, guide timely treatment decisions, reduce transmission, and substantially strengthen global TB elimination efforts, particularly in high-burden, low-resource settings.

8
Intravital mid-infrared biosensing by normalized spatial probing of self-referenced optothermal signals

Berger, C. G.; Puttfarcken, B.; Qiu, J.; Hauer, I.; Herr, S.; Juestel, D.; Pleitez, M. A.

2026-05-28 endocrinology 10.64898/2026.05.27.26354202 medRxiv
Top 6%
0.2%
Show abstract

We present a compact pump-and-probe mid-infrared Optothermal Spectrometer (OTHES) equipped with Spatial Probing and Autocorrection (SPAC) optimized for robust intravital application in humans. SPAC-OTHES facilitates alignment stability and spectral comparability across different measurement sessions involving different skin types. Contrary to state-of-the-art, SPAC-OTHES uses camera-based beam detection and an auto-calibration mechanism that enables ca. 73% better spectral reproducibility in intravital measurements in human volunteers than non-calibrated readouts. Moreover, SPAC-OTHES has the potential to lower the glucose quantification error, as demonstrated here in artificial skin phantoms, where an improvement of 52% compared to conventional diode-based detection was observed. The compactness of OTHES, combined with reliable SPAC-readout, has the potential to accelerate commercialization and broad application of biosensors based on mid-infrared spectroscopy.

9
Pre-infusion Exhaled breath volatile organic compounds predict severe CRS and ICANS after CAR T-cell therapy

Berna, A.; Fahrmann, J.; Irajizad, E.; Rudsari, H.; Liu, Y.; Logan, J.; Murtada, K.; Grandy, J.; Edwards, M.; Ayers, A.; Ahmed, S.; Neelapu, S.; Saini, N.; John, A.; John, T.

2026-06-01 oncology 10.64898/2026.05.28.26354352 medRxiv
Top 6%
0.2%
Show abstract

Background: Severe cytokine release syndrome (CRS) and immune effector cell-associated neurotoxicity syndrome (ICANS) are major dose-limiting toxicities of chimeric antigen receptor (CAR) T-cell therapy. Existing pre-infusion biomarkers offer modest discrimination, motivating non-invasive alternatives. Methods: We prospectively enrolled 26 patients with relapsed/refractory large B-cell lymphoma receiving axicabtagene ciloleucel. Pre-infusion (day -1) exhaled breath samples were analyzed by gas chromatography-mass spectrometry for 40 volatile organic compounds (VOCs). Candidates with univariate AUC > 0.65 for severe (grade >=2) CRS or ICANS were carried forward to sensitivity-maximization-at-given-specificity with LASSO regularization (SMAGS-LASSO), which selected separate panels for each outcome. Model performance was assessed by leave-one-out cross-validation with permutation p-values and Harrell bootstrap optimism correction. Results: The 4-VOC CRS panel (heptanal, benzaldehyde, 2-butanone, ethylbenzene) achieved LOOCV AUC 82.5% (80% sensitivity at 88% specificity) and the 3-VOC ICANS panel (nonanal, allyl methyl sulfide, levomenthol) achieved AUC 86.3% (67% sensitivity at 86% specificity). By tertile, severe CRS occurred in 8/9 (89%) high-risk versus 2/9 (22%) low-risk patients (Cox HR 6.82, 95% CI 1.41-32.9, p=0.017) and severe ICANS occurred in 8/9 (89%) versus 2/9 (22%) (HR 8.28, 95% CI 1.73-39.6, p=0.008). Each 1-SD score increase corresponded to a 3.80-fold higher hazard of severe CRS (p<0.001) and 4.36-fold higher hazard of severe ICANS (p<0.001). In head-to-head comparison, the 3-VOC ICANS panel outperformed the modified Endothelial Activation and Stress Index (mEASIX) (delta-AUC +0.36, DeLong 1-sided p=0.008). The 4-VOC CRS panel had numerically higher AUC than mEASIX (delta-AUC +0.19, p=0.150). Conclusions: Pre-infusion exhaled breath VOC panels stratify CAR T-cell recipients by severity and timing of severe CRS and ICANS, providing a non-invasive complement to existing serum biomarkers. Multi-institutional validation is warranted.

10
Breath volatile profiling reveals a diagnostic signature of MASLD in children

Berna, A. Z.; Panganiban, J.; Liu, Y.; Logan, J.; Russo, P.; Aryal, A.; Hafertepe, K.; Abu-Alreesh, S.; DeBosch, B.; Stoll, J.; John, A. R. O.

2026-05-27 gastroenterology 10.64898/2026.05.26.26353794 medRxiv
Top 7%
0.1%
Show abstract

Background & Aims: Metabolic Dysfunction Associated Steatotic Liver Disease (MASLD) is the leading cause of chronic liver disease in children. However, accurate, noninvasive diagnostic tools remain limited. Current screening methods are invasive or lack sensitivity. Breath-based volatile organic compound (VOC) analysis offers a simple approach with potential for point of care screening. This study aimed to identify and validate breath VOC signatures of pediatric MASLD. Approach & Results: We conducted a prospective IRB approved cohort study at the Childrens Hospital of Philadelphia (CHOP). Children aged between 7 and 20 years with MASLD (n=22), as defined by hepatic steatosis either by liver biopsy or imaging and 1 cardiometabolic risk factor, and a control group without MASLD (n=20) were enrolled. Breath samples were collected using a standardized protocol and analyzed by untargeted comprehensive two-dimensional gas chromatography-mass spectrometry (GCGCMS). Machine learning and unsupervised clustering were applied to identify discriminatory VOCs and assess heterogeneity. Untargeted GCGCMS analysis identified a distinct breath VOC signature in children with MASLD compared with non MASLD controls. A Random Forest model achieved a sensitivity of 73% and specificity of 65%, with AUC of 0.84. The VOC 2,4-dimethyl-1-heptene demonstrated strong diagnostic performance in the discovery cohort with a sensitivity of 85%, specificity of 77% and an AUC of 0.81. Unsupervised clustering revealed four MASLD subgroups with distinct volatile phenotypes associated with differences in liver enzymes and metabolic parameters. External validation in a second pediatric cohort confirmed reproducible reductions in o/p-xylene in subjects with MASLD. Conclusions: Pediatric MASLD is associated with a reproducible breath VOC signature identified by untargeted GCGCMS. These findings support breath analysis as a scalable, noninvasive screening and stratification tool for pediatric MASLD and warrant validation in larger, longitudinal studies.

11
Mediterranean Dietary Approaches to Stop Hypertension Intervention for Neurodegenerative Delay Diet is Associated with Reduced Inflammatory Bowel Disease Related Surgery Risk: A Prospective Cohort Study

Sun, Y.; Jiang, Z.; Dan, L.; Qian, Y.; Wellens, J.; Yao, J.; Li, X.; Wang, X.; Magro, F.; Chen, Y.; Chen, J.

2026-05-30 nutrition 10.64898/2026.05.28.26354274 medRxiv
Top 7%
0.1%
Show abstract

Objectives: The Mediterranean-DASH Intervention for Neurodegenerative Delay (MIND) diet has been associated with the risk of IBD, but its impact on clinical outcomes is uncertain. This study evaluated the association between MIND diet adherence and the risk of IBD-related surgery in a prospective cohort. Methods: This study included 2,288 participants with diagnosis of Crohn's disease (CD, n=777) or ulcerative colitis (UC, n=1,511) who completed valid WebQ 24-hour dietary recall from the UK Biobank. Dietary adherence was derived from a 15-component score based on 24-hour dietary recalls. Associations with IBD-related surgery were evaluated using Cox proportional hazards models, with nonlinear trends and examined via restricted cubic splines. Effect modification was explored in pre-specified subgroups, and multiple sensitivity analyses were conducted to assess robustness. Results: During 10.9 years of follow-up, 166 incident IBD-related surgery cases occurred. Higher MIND diet adherence was associated with reduced surgical risk. Compared with the lowest tertile of adherence, the highest tertile showed a 36% reduction in surgical risk in IBD (HR 0.64, 95% CI: 0.44-0.94, P = 0.024). Notably, this protective effect was pronounced in patients with CD, exhibiting a clear linear inverse association. In contrast, a reverse J-shaped association was observed in UC, with a steep initial decline in surgical risk followed by a plateau emerging at a MIND score of approximately 5, beyond which further adherence conferred minimal additional benefit. At the component level, higher vegetable consumption and lower intake of butter and fried foods were identified as independent protective factors against surgery. Stronger inverse associations were observed among patients with shorter disease duration and those with complicated disease behavior, including stricturing or penetrating phenotypes (all P interaction < 0.05). Conclusion: Greater MIND diet adherence is associated with reduced IBD-related surgery risk among patients with IBD and CD. These findings support the MIND diet as a feasible dietary strategy to improve IBD prognosis.

12
Multivariate determinants of wearable-measured sleep quality across a large observational cohort: roles of physical activity, gut microbiome, blood analytes, and lifestyle factors.

Cavon, J.; Perez, C.; Quinn-Bohmann, N.; Magis, A. T.; Gibbons, S. M.

2026-05-29 health informatics 10.64898/2026.05.27.26354250 medRxiv
Top 8%
0.1%
Show abstract

Emerging evidence links the gut microbiome to sleep quality, yet measuring sleep at scale remains challenging. Commercial wearables, such as Fitbit, capture objective sleep and activity data in naturalistic settings. We integrated Fitbit data from a large, deeply-phenotyped cohort with paired lifestyle and health questionnaires. Wearable-derived measures aligned well with self-reported sleep, activity, and happiness. We identified dozens of covariate-adjusted associations between Fitbit-derived sleep features, lifestyle factors, and multi-omic data. Among molecular feature sets, the gut microbiome showed the greatest number of associations with sleep quality: butyrate-producing genera were positively associated with sleep and amplified the benefits of physical activity. Oscillospira, in particular, was consistently associated with better sleep. In blood, insulin, omega-3, and cortisol correlated with poorer sleep, whereas lower alcohol intake and mineral supplements correlated with better sleep. These robust, covariate-adjusted findings advance mechanistic understanding of the gut-sleep axis and broader molecular and lifestyle determinants of sleep quality.

13
Development and validation of a multiplexed quantitative PCR assay for clinical detection and surveillance of Oropouche virus

Stachler, E.; McMahon, K.; Gopal, N.; Knoll, H.; Baillargeon, K. R.; Mora, A. C.; Wondrash, H. A.; Sullivan, E. M.; Rush, S.; Gratalo, D.; Ozonoff, A.; Sabeti, P. C.; Springer, M.

2026-05-28 infectious diseases 10.64898/2026.05.26.26354109 medRxiv
Top 11%
0.0%
Show abstract

Background Oropouche virus (OROV) is an emerging vector-borne virus with rapidly expanding geographic range, increasing case counts, and growing evidence of severe outcomes including neuroinvasive disease and vertical transmission. Because OROV infection presents with nonspecific febrile illness that overlaps clinically with other viruses including dengue, zika, and chikungunya, accurate molecular diagnostics are essential for patient care and surveillance. Yet existing assays rely on single genomic targets and are vulnerable to detection failure as the virus evolves and reassorts. Methodology/Principal Findings To support diagnostic capacity, we developed and clinically validated a multiplexed qPCR assay targeting three regions of the OROV S segment, incorporating redundancy to preserve sensitivity across viral diversity while enabling robust clinical interpretation. The multiplex also includes an assay targeting RNaseP as an internal sample control to ensure adequate sample processing. We evaluated assay performance using both historical and contemporary OROV strains and validated the assay on contrived serum, plasma, and cerebrospinal fluid samples, assessing linearity, limit of detection (LOD), accuracy, specificity, precision, and sample stability. The assay met or exceeded all predefined acceptance criteria for clinical testing and achieved an LOD as low as 6 copies per reaction for contemporary outbreak strains. We further implemented a logic-based interpretation matrix that reduced false-positive risk while maintaining sensitivity near the analytical LOD. Conclusions/Significance Our assay sensitively and specifically detects OROV RNA in serum, plasma, and cerebrospinal fluid while incorporating safeguards against viral evolution and reassortment. The assay has been approved for use by CLIA at Nexus Medical Labs in 49 U.S. states, expanding access to timely OROV diagnostics in the United States and providing a durable framework for molecular detection of reassorting, rapidly evolving viruses as OROV continues to spread into new regions.

14
Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

Hu, S.; Cheng, H.; Gillenwater, L.; Manpearl, K.; Mandava, A.; Wang, Y.; Pividori, M.; Stranger, B.; Krishnan, A.; Greene, C.; Gao, Y.

2026-05-28 health informatics 10.64898/2026.05.26.26354182 medRxiv
Top 11%
0.0%
Show abstract

Objective. Biomedical knowledge graphs (KGs) such as PrimeKG, Hetionet, UMLS, and PharmGKB are increasingly used as the substrate for downstream machine-learning, retrieval-augmented generation, drug-repurposing, and electronic health record (EHR) augmentation pipelines. The dominant assumption in published work is that integrating two or more such KGs is a tractable engineering step solved by identifier (ID) matching. This paper interrogates that assumption empirically. We quantify how much concept overlap survives realistic alignment, and we characterize the new failure modes introduced by the methods that practitioners reach for when ID matching is insufficient. Materials and Methods. We compared four widely used biomedical KGs (PrimeKG, Hetionet v1.0, the full UMLS Metathesaurus, and PharmGKB) across eleven node types using a tiered alignment pipeline: (1) direct ID matching for nodes sharing a primary vocabulary; (2) cross-ontology bridging using standard mappings (e.g., MONDO-DOID, HPO-UMLS, HPO-UMLS-MeSH for side effects, NCBI Gene-HGNC-UMLS, UBERON-FMA/SNOMEDCT_US/NCI/MeSH for anatomy); (3) ClinicalBERT cosine-similarity grouping at threshold >= 0.98 for over-segmented disease nodes, with a deterministic suffix-stripping canonicalizer; (4) exact name matching for ontology-poor types (anatomy, REACTOME pathways); and (5) embedding-based fuzzy matching with UMLS lookup (SapBERT and ClinicalBERT) for free-text microbiome concepts. We applied the pipeline to a 698-concept gut-microbiome benchmark spanning taxa, pathways, and disease labels, validated grouping decisions against the curated SSSOM mappings released by the MONDO project, and audited the ClinicalBERT consolidation against five clinical-genetics case studies drawn from the literature. Results. Per-type pairwise coverage was strikingly asymmetric. Genes/proteins and the three Gene Ontology categories aligned cleanly across PrimeKG and Hetionet (mutual coverage 94-99%), but disease overlap was sparse: only 0.7% of PrimeKG individual disease nodes mapped to Hetionet, rising to 2.0% after MONDO grouping (versus 78.7% and 18.4% from the Hetionet side). PrimeKG-to-UMLS coverage spanned 100% (effect/phenotype via HPO) down to 20.8% (REACTOME pathways), with drugs at 73.7% and anatomy at 58.8%. PrimeKG-to-PharmGKB drug coverage required up to two bridging hops (DrugBank -> UMLS -> RxNorm/ATC/MeSH). Bigger was not uniformly more complete: on a 698-concept microbiome drug benchmark, Hetionet missed 0 concepts while PrimeKG missed 16. ClinicalBERT-based grouping consolidated 22,205 raw MONDO disease nodes into 17,080 groups but introduced three reproducible failure modes documented in case studies: (i) peer over-merging: for example, all 22 osteogenesis imperfecta subtypes collapsed into a single node despite distinct severity classes; (ii) parent-child collapse: e.g. acute myeloid leukemia merged with myeloid leukemia, erasing the acute/chronic distinction that drives clinical management; and (iii) lexical false positives: neurofibromatosis and schwannomatosis grouped together despite cellular-pathology differences. Discussion. Identifier matching alone is a weak baseline for biomedical KG integration. Cross-ontology bridges and embedding-based consolidation expand coverage but do so at the cost of clinically meaningful resolution, and the resulting failures are systematic rather than random. Reporting only aggregate coverage statistics obscures these losses, which propagate silently into downstream tasks. Conclusion. We provide reusable per-type coverage tables, a taxonomy of three integration failure modes, and concrete recommendations for downstream studies that depend on a unified biomedical KG. We argue that future KG integration work should report per-type coverage and per-cluster confidence rather than aggregate match rates.

15
Neutrophil-primed immunopathology in poorly-controlled diabetes worsens matrix destruction in pulmonary tuberculosis

Thong, P. M.; Hu, T. H.; Ooi, J. S. G.; Loh, F. K.; Lee, H.; Bai, C.; Chong, H. T.; Chang, A. J. W.; Choong, C. V.; Galamay, L.; Beh, D. L. L.; Ang, A. X. Y.; Lum, L. H. W.; Yang, S. P.; Lim, A. Y. L.; Mok, S. F.; Vallejo, A. F.; Kao, S. L.; Chan, K. R.; Ong, C. W. M.

2026-05-26 respiratory medicine 10.64898/2026.05.24.26353970 medRxiv
Top 11%
0.0%
Show abstract

Background: Diabetes mellitus (DM) worsens pulmonary tuberculosis (TB) and drives systemic hyper-inflammation, but the underlying mechanisms remain unknown. Neutrophils have key roles in TB immunopathology and lung cavitation. Here, we determine the role of neutrophils in DMTB patients and in driving TB immunopathology. Methods: Sputum and plasma from 30 TB and 30 DMTB patients were analysed for proteases and cytokines using Luminex bead array. Whole blood transcriptomics identified transcriptional differences. Single-cell RNA sequencing characterised neutrophil subsets and dysregulated pathways. Neutrophil function of poorly-controlled DM patients (HbA1c>8%) and healthy controls (HC) were examined following Mycobacterium tuberculosis stimulation, including reactive oxygen species (ROS), neutrophil extracellular traps (NETs), and phagocytosis. Pathways were interrogated using chemical inhibitors, protein array and western blot. Results: Compared to non-diabetic TB patients, poorly-controlled DMTB patients showed up-regulated sputum MMP-8 and MMP-9, associated with increased collagen-destruction and lung cavity formation. Circulating neutrophil count and neutrophil-derived plasma MMP-8 were up-regulated, alongside transcriptional enrichment of extracellular matrix degradation and inflammatory pathways including TNF and RAGE. Single-cell profiling identified reduced cycling neutrophil subset and myelocytes in DMTB, with overall reduced antibacterial and cell-killing signatures. Ex vivo mycobacterial stimulation of DM neutrophils increased ROS and MMP-9 with impaired NETs and delayed phagocytosis. TNFR1, TNFR2, and RAGE were up-regulated. RAGE inhibition with rosiglitazone mitigated Mtb-induced ROS and MMP-8 release. Conclusion: DM worsens neutrophil-driven tissue destruction and inflammation in TB via dysregulated TNF and RAGE-signalling, priming neutrophils towards immunopathology. Targeting RAGE alongside tight glycaemic control may dampen neutrophil hyper-inflammatory responses to limit tissue destruction.

16
Pigeon-Guano-Contaminated Environments in Blantyre, Southern Malawi, are Reservoirs of Medically Important Fungi

Merico, B. J.; Chigwechokha, P.; Alubino, P.; Bandawe, G. P.

2026-05-30 occupational and environmental health 10.64898/2026.05.26.26354139 medRxiv
Top 11%
0.0%
Show abstract

Close to 50% of all bird species are reservoirs of potentially pathogenic fungi, including those listed as priority by the World Health Organization. In Malawi, data on diversity, pathogenic potential, and ecological avian sources of medically important yeast are scarce. A cross-sectional study using a descriptive approach was conducted in Blantyre, Southern Malawi, to characterise medically important yeasts recovered from environments contaminated with excreta/guano from synanthropic pigeons. A total of 20 samples were collected from 4 peri-urban areas, which yielded 71 yeast isolates. To assess the pathogenic potential of the environmental isolates, we compared their phenotypic virulence traits with those of 21 clinical yeast isolates collected from referral hospital laboratories. Pichia kudriavzevii (39%) and Candida orthopsilosis (30%) were the commonly isolated species in the pigeon-guano-contaminated environments. Candida parapsilosis sensu stricto (29%) and Candida albicans (24%) constituted most of the clinical yeast isolates. Half of the species isolated in the pigeon-guano-contaminated environments were also identified among the clinical isolates. A majority of the environmental isolates showed virulence traits similar to or stronger than clinical isolates. The findings underscore the critical need for integrated surveillance under the One Health framework, especially in bird-inhabited spaces close to human settlements.

17
Domain-based basal and ambulatory glycemic exposure metrics derived from continuous glucose monitoring: a real-world clinic-based study

Shinde, S. N.; Shinde, R. S.; Bhangaaley, S. Y.

2026-05-26 endocrinology 10.64898/2026.05.24.26353983 medRxiv
Top 12%
0.0%
Show abstract

Background: Consensus continuous glucose monitoring (CGM) metrics, including time in range (TIR), time above range (TAR), time below range (TBR), mean glucose, glucose management indicator, and glycemic variability, are essential for modern glucose assessment. However, these whole-day summaries do not explicitly partition nocturnal basal from daytime ambulatory glycemic burden. Objective: To develop and evaluate a complementary domain-based CGM framework that quantifies basal and daytime ambulatory glycemic exposure across oral glucose tolerance test (OGTT)-derived dysglycemia phenotypes. Methods: In this observational, clinic-based study, 253 individuals underwent OGTT with insulin measurement and CGM. Participants were classified using a prespecified OGTT-derived phenotyping algorithm, implemented through a deterministic rules-based web calculator, and collapsed into five groups: NoDM, Increased insulin resistance, Midzone Glycemia, Prediabetes, and Diabetes. CGM files were uniformly reprocessed by selecting the latest contiguous episode and retaining the most recent 15 calendar days with data. The 24-hour profile was partitioned into nocturnal basal (00:00 to <06:00) and daytime ambulatory (06:00 to <24:00) domains. Derived indices included Area of Basal Glycemia (ABG), Area of Prandial/Daytime Ambulatory Glycemia (APG), incremental ABG (iABG), incremental APG (iAPG), and exploratory deficit indices dABG and dAPG. Results: The final dataset contributed 3,647 analyzable CGM days. APG remained higher than ABG across all groups. Mean ABG/APG increased from 80.45/86.38 mg/dL in NoDM to 111.96/124.70 mg/dL in Diabetes. Mean iABG/iAPG increased from 5.65/6.60 to 34.12/38.91 mg/dL, whereas dABG/dAPG declined as dysglycemia worsened. Conclusions: The ABG/APG framework provides interpretable, domain-resolved CGM burden metrics that separate basal from daytime ambulatory exposure and distinguish total burden from above-threshold excess. These indices are proposed as adjunctive metrics to support dysglycemia phenotyping, early risk recognition, and treatment monitoring, but are not intended to replace established consensus CGM metrics or diagnostic criteria. External, prospective validation is required.

18
High-dimensional Characterization of Genome-Environment Fitness Landscapes in Klebsiella pneumoniae

Zhou, G.; Williams, G.; Millner, M. T.; AlHirayban, R.; Alosaimi, W.; Fallatah, O.; Hart, A. J.; Malaikah, M.; Iftikhar, S.; Ahmad, H.; Roghanian, M.; Mustonen, V.; AlYami, R.; Banzhaf, M.; Moradigaravand, D.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.28.26354339 medRxiv
Top 13%
0.0%
Show abstract

Background Bacterial fitness is shaped by interactions between genome variation and environmental context, yet how these interactions determine its predictability and heritability remains unclear. In the clinically important pathogens of Klebsiella pneumoniae, a leading cause of hospital-acquired infections, this question is particularly pressing. Despite extensive genomic characterization, we still lack a systematic understanding of how genome-wide variation translates into fitness across diverse environments in K. pneumoniae. Methods We filled this gap by profiling a systematic collection of 1,462 clinical K. pneumoniae isolates across 214 diverse environmental and pharmacological stress conditions using high-throughput chemical genomics. Fitness was quantified from colony growth and integrated with whole-genome sequencing data. Genome-wide association analyses identified genetic determinants of fitness, and machine learning models incorporating genomic features were used to predict fitness.Results Fitness exhibited a strongly environment-dependent genetic architecture, with modest but significant concordance between genetic background and phenotypic variation. Under antibiotic and stress-combination conditions, fitness was driven by discrete, high-effect determinants, including known resistance genes, resulting in stronger signals and improved predictability. In contrast, non-antibiotic environments showed more polygenic and distributed architectures with weaker associations. Genome-wide analyses identified both established and previously uncharacterized genes linked with fitness across conditions. Resistance and virulence determinants exhibited clear context-dependent trade-offs, conferring fitness advantages under selection but imposing costs in non-selective environments. Consistent with this, plasmid carriage showed environment- and genotype-dependent fitness effects, with benefits under antibiotic pressure and measurable costs otherwise. Genomic variant-based models for fitness prediction achieved moderate performance (Mean Spearman correlation ({rho}) = 0.36 (95% CI: 0.18-0.67) for predicted versus observed values in unseen data) across conditions, with improved accuracy under strong antibiotic selective pressures, and produced well-calibrated prediction intervals with high coverage. Despite strong population structure effect on predictions, models captured predictive gene and SNP biomarkers for fitness. Conclusion These findings highlight that bacterial fitness is an emergent property of genome-environment interactions rather than a fixed attribute of genotype. This work establishes a unified high-dimensional genotype-phenotype framework linking genomic variation to fitness across diverse conditions in a major pathogen, with broader implications for other pathogenic bacterial species.

19
Antibiotic Timing and Survival After Immune Checkpoint Inhibitor Initiation in Patients With Cancer

Zhang, K.; John, D.; Li, W. T.; Hogarth, M.; McKay, R. R.; Ongkeko, W. M.

2026-05-28 oncology 10.64898/2026.05.27.26354193 medRxiv
Top 13%
0.0%
Show abstract

Importance: While gut dysbiosis is known to impair response to immune checkpoint inhibitors (ICIs), the relative clinical impact of antibiotic timing (pre- vs. post-ICI initiation) remains unclear. Objective: To evaluate whether antibiotic timing differentially influences overall survival (OS) in a large, multi-institutional pan-cancer cohort. Design, Setting, and Participants: This retrospective cohort study utilized deidentified electronic health record data from six academic medical centers within the University of California Health system. We included 21,108 adults with any malignancy who received PD-1, PD-L1, or CTLA-4 inhibitors between January 2014 and December 2024. Exposures: Antibiotic exposure windows were categorized as pre-only (-60 to -1 days), post-only (+1 to +60 days), both windows, or none. Main Outcomes and Measures: The primary outcome was overall survival (OS) calculated from the first ICI dose. Multivariable Cox proportional hazards models adjusted for demographics, tumor type, line of therapy, and baseline health indicators (albumin, NLR, and recent hospitalization). Results: Among 21,108 patients, 17.3% had pre-only exposure, 13.3% had post-only exposure, and 60.6% had no exposure. In the multivariable model, post-only exposure (HR, 1.27; 95% CI, 1.20-1.35) and combined pre- and post- exposure (HR, 1.31; 95% CI, 1.23-1.40) were significantly associated with higher mortality. Pre-only exposure was not significantly associated with OS (HR, 1.04; 95% CI, 0.99-1.10). Subgroup analyses by tumor type showed consistent trends across major malignancies, including head and neck (Post HR, 1.46) and renal cell carcinoma (Post HR, 1.26). Conclusions and Relevance: In contrast to some smaller studies, this large-scale analysis indicates that antibiotic exposure after ICI initiation carries a greater risk than exposure prior to treatment. These findings highlight the need for rigorous antibiotic stewardship strategies specifically during the early phases of immunotherapy treatment.

20
Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv
Top 14%
0.0%
Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.