Back

Exploring the extent of uncatalogued genetic variation in antimicrobial resistance gene families in Escherichia coli

Lipworth, S.; Crook, D. W.; Walker, A. S.; Peto, T. E.; Stoesser, N.

2023-03-15 infectious diseases
10.1101/2023.03.14.23287259 medRxiv
Show abstract

BackgroundAntimicrobial resistance (AMR) in E. coli is a global problem associated with substantial morbidity and mortality. AMR-associated genes are typically annotated based on similarity to a variants in a curated reference database with an implicit assumption that uncatalogued genetic variation within these is phenotypically unimportant. In this study we evaluated the potential for discovering new AMR-associated gene families and characterising variation within existing ones to improve genotype-to-susceptibility-phenotype prediction in E. coli. MethodsWe assembled a global dataset of 9001 E. coli sequences of which 8586 had linked antibiotic susceptibility data. Raw reads were assembled using Shovill and AMR genes extracted using the NCBI AMRFinder tool. Mash was used to calculate the similarity between extracted genes using Jaccard distances. We empirically reclustered extracted gene sequences into AMR-associated gene families (70% match) and alleles (ARGs, 100% match). ResultsThe performance of the AMRFinder database for genotype-to-phenotype predictions using strict 100% identity and coverage thresholds did not meet FDA thresholds for any of the eight antibiotics evaluated. Relaxing filters to default settings improved sensitivity with a specificity cost. For all antibiotics, a small number of genes explained most resistance although a proportion could not be explained by known ARGs; this ranged from 75.1% for co-amoxiclav to 3.4% for ciprofloxacin. Only 17,177/36,637 (47%) of ARGs detected had a 100% identity and coverage match in the AMRFinder database. After empirically reclassifying genes at 100% nucleotide sequence identity, we identified 1292 unique ARGs of which 158 (12%) were present [&ge;]10 times, 374 (29%) were present 2-9 times and 760 (59%) only once. Simulated accumulation curves revealed that discovery of new (100%-match) ARGs present more than once in the dataset plateaued relatively quickly whereas new singleton ARGs were discovered even after many thousands of isolates had been included. We identified a strong correlation (Spearman coefficient 0.76 (95% CI 0.72-0.79, p<0.001)) between the number of times an ARG was observed in Oxfordshire and the number of times it was seen internationally, with ARGs that were observed 7 times in Oxfordshire always being found elsewhere. Finally, using the example of blaTEM-1, we demonstrated that uncatalogued variation, including synonymous variation, is associated with potentially important phenotypic differences (e.g. two common, uncatalogued blaTEM-1 alleles with only synonymous mutations compared to the known reference were associated with reduced resistance to co-amoxiclav [aOR 0.57, 95%CI 0.34-0.93, p=0.03] and piperacillin-tazobactam [aOR 0.54, 95%CI 0.32-0.87, p=0.01]). ConclusionsOverall we highlight substantial uncatalogued genetic variation with respect to known ARGs, although a relatively small proportion of these alleles are repeatedly observed in a large international dataset suggesting strong selection pressures. The current approach of using fuzzy matching for ARG detection, ignoring the unknown effects of uncatalogued variation, is unlikely to be acceptable for future clinical deployment. The association of synonymous mutations with potentially important phenotypic differences suggests that relying solely on amino acid-based gene detection to predict resistance is unlikely to be sufficient. Finally, the inability to explain all resistance using existing knowledge highlights the importance of new target gene discovery.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Journal of Antimicrobial Chemotherapy
43 papers in training set
Top 0.1%
22.4%
2
JAC-Antimicrobial Resistance
13 papers in training set
Top 0.1%
14.6%
3
The Lancet Microbe
43 papers in training set
Top 0.1%
8.4%
4
Antimicrobial Agents and Chemotherapy
167 papers in training set
Top 0.5%
4.3%
5
Microbial Genomics
204 papers in training set
Top 0.5%
4.1%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 31%
3.9%
7
Genome Medicine
154 papers in training set
Top 2%
3.6%
8
International Journal of Antimicrobial Agents
15 papers in training set
Top 0.1%
3.0%
9
PLOS ONE
4510 papers in training set
Top 47%
2.3%
10
BMC Infectious Diseases
118 papers in training set
Top 2%
2.1%
11
The Journal of Infectious Diseases
182 papers in training set
Top 2%
1.8%
12
Nature Communications
4913 papers in training set
Top 51%
1.7%
13
Clinical Microbiology and Infection
60 papers in training set
Top 0.6%
1.7%
14
Clinical Infectious Diseases
231 papers in training set
Top 3%
1.7%
15
Open Forum Infectious Diseases
134 papers in training set
Top 2%
1.3%
16
Antibiotics
32 papers in training set
Top 0.9%
1.3%
17
PLOS Medicine
98 papers in training set
Top 3%
1.1%
18
Journal of Global Antimicrobial Resistance
15 papers in training set
Top 0.5%
1.1%
19
Journal of Medical Microbiology
20 papers in training set
Top 0.5%
0.9%
20
Wellcome Open Research
57 papers in training set
Top 2%
0.9%
21
Infection Control & Hospital Epidemiology
17 papers in training set
Top 0.4%
0.9%
22
Journal of Clinical Microbiology
120 papers in training set
Top 1%
0.9%
23
PLOS Biology
408 papers in training set
Top 17%
0.9%
24
mBio
750 papers in training set
Top 11%
0.7%
25
Antimicrobial Resistance & Infection Control
10 papers in training set
Top 0.3%
0.7%
26
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
27
Communications Medicine
85 papers in training set
Top 1%
0.7%
28
Journal of Infection
71 papers in training set
Top 3%
0.7%
29
Eurosurveillance
80 papers in training set
Top 2%
0.7%
30
mSphere
281 papers in training set
Top 7%
0.6%