Back

Integrated Machine Learning-PanGWAS Reveals Chromosome-Encoded Persistence Networks and Plasmid Plasticity in Recurrent Urinary Tract Infection in Escherichia coli

Rajendran, S.; Nagarajan, S.; MOHAN S., S.

2026-05-22 infectious diseases
10.64898/2026.05.20.26353739 medRxiv
Show abstract

Background: Recurrent urinary tract infections(rUTI) represent a major clinical challenge due to persistent clinical symptoms, repeated antibiotic exposure, and increased risk of multidrug resistance. Further clinical management of rUTI remains challenging, as existing diagnostic and treatment guidelines are largely designed for uncomplicated, acute infections. Though uropathogenic Escherichia coli (UPEC) is the predominant cause of community-acquired UTIs, pathogen-derived genomic features that may predispose certain E. coli strains to repeatedly establish infection are not fully understood. Methods: To comprehensively dissect distinct genetic signals across genomic compartments that distinguish rUTI-associated isolates from those causing sporadic infection, the pan-genome analysis in three different frameworks (i) Combined genomes (chromosome + plasmid), (ii) bacterial chromosomes only and (iii) plasmid-only was conducted. A comprehensive evaluation of population structure was performed using Gubbins, recombination-aware phylogeny IQTree, phylogroup distribution, pan-genome openness using Heaps law, and plasmidome architecture using MOBSUITE. Findings: Supervised machine learning models showed that the highest discriminatory performance was achieved using the combined genomic dataset (accuracy ~0.98), and integration of feature-selected genes with PanGWAS (Pyseer and Scoary) identified a robust set of recurrence-associated genes, namely cbtA, cbeA, and ldrD, which were consistently detected across machine learning and association frameworks. Subsequent association rule mining further revealed cooperative gene networks enriched in rUTI isolates, particularly involving toxin-antitoxin modules and metabolic regulators. Interpretation: Overall, this integrated ML-PanGWAS approach demonstrates that rUTI is a lineage-independent, polygenic phenotype encoded within a combined chromosomal-plasmid genomic context, providing new insights into the bacterial genomic architecture underlying recurrent disease and offering candidate biomarkers for future diagnostic and therapeutic development.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Microbial Genomics
204 papers in training set
Top 0.1%
41.6%
2
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 0.3%
7.1%
3
The Journal of Infectious Diseases
182 papers in training set
Top 1.0%
3.8%
50% of probability mass above
4
mSystems
361 papers in training set
Top 3%
3.8%
5
Scientific Reports
3102 papers in training set
Top 33%
3.8%
6
Frontiers in Microbiology
375 papers in training set
Top 3%
2.9%
7
mBio
750 papers in training set
Top 5%
2.9%
8
Genome Medicine
154 papers in training set
Top 3%
2.6%
9
Nature Communications
4913 papers in training set
Top 49%
1.9%
10
Clinical Infectious Diseases
231 papers in training set
Top 2%
1.8%
11
JCI Insight
241 papers in training set
Top 4%
1.6%
12
PLOS ONE
4510 papers in training set
Top 57%
1.4%
13
Microbiome
139 papers in training set
Top 2%
1.3%
14
The Lancet Microbe
43 papers in training set
Top 0.8%
1.3%
15
Cell Reports Medicine
140 papers in training set
Top 6%
1.0%
16
Journal of Infection
71 papers in training set
Top 2%
1.0%
17
mSphere
281 papers in training set
Top 5%
0.9%
18
Microbiological Research
19 papers in training set
Top 0.6%
0.8%
19
Gut Microbes
70 papers in training set
Top 1.0%
0.8%
20
FEMS Microbes
14 papers in training set
Top 0.5%
0.7%
21
Open Forum Infectious Diseases
134 papers in training set
Top 3%
0.7%
22
eLife
5422 papers in training set
Top 60%
0.7%
23
PLOS Pathogens
721 papers in training set
Top 9%
0.7%
24
BMC Medicine
163 papers in training set
Top 8%
0.7%
25
Microorganisms
101 papers in training set
Top 3%
0.5%
26
Communications Medicine
85 papers in training set
Top 2%
0.5%
27
Clinical Microbiology and Infection
60 papers in training set
Top 2%
0.5%