Integrated Machine Learning-PanGWAS Reveals Chromosome-Encoded Persistence Networks and Plasmid Plasticity in Recurrent Urinary Tract Infection in Escherichia coli
Rajendran, S.; Nagarajan, S.; MOHAN S., S.
Show abstract
Background: Recurrent urinary tract infections(rUTI) represent a major clinical challenge due to persistent clinical symptoms, repeated antibiotic exposure, and increased risk of multidrug resistance. Further clinical management of rUTI remains challenging, as existing diagnostic and treatment guidelines are largely designed for uncomplicated, acute infections. Though uropathogenic Escherichia coli (UPEC) is the predominant cause of community-acquired UTIs, pathogen-derived genomic features that may predispose certain E. coli strains to repeatedly establish infection are not fully understood. Methods: To comprehensively dissect distinct genetic signals across genomic compartments that distinguish rUTI-associated isolates from those causing sporadic infection, the pan-genome analysis in three different frameworks (i) Combined genomes (chromosome + plasmid), (ii) bacterial chromosomes only and (iii) plasmid-only was conducted. A comprehensive evaluation of population structure was performed using Gubbins, recombination-aware phylogeny IQTree, phylogroup distribution, pan-genome openness using Heaps law, and plasmidome architecture using MOBSUITE. Findings: Supervised machine learning models showed that the highest discriminatory performance was achieved using the combined genomic dataset (accuracy ~0.98), and integration of feature-selected genes with PanGWAS (Pyseer and Scoary) identified a robust set of recurrence-associated genes, namely cbtA, cbeA, and ldrD, which were consistently detected across machine learning and association frameworks. Subsequent association rule mining further revealed cooperative gene networks enriched in rUTI isolates, particularly involving toxin-antitoxin modules and metabolic regulators. Interpretation: Overall, this integrated ML-PanGWAS approach demonstrates that rUTI is a lineage-independent, polygenic phenotype encoded within a combined chromosomal-plasmid genomic context, providing new insights into the bacterial genomic architecture underlying recurrent disease and offering candidate biomarkers for future diagnostic and therapeutic development.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.