Back

Evaluating Genomic Surveillance Methods for Shigella sonnei in a High-Income Setting

Wei, K. C.; Chong, C. E.; Batisti Biffignandi, G.; Mason, L. C. E.; Morrison, R.; Jenkins, C.; Baker, K. S.

2026-05-12 public and global health
10.64898/2026.05.08.26352707 medRxiv
Show abstract

Shigella sonnei is a human-adapted enteric pathogen with a very low infectious dose and increasing antimicrobial resistance. In high-income settings, transmission is multimodal including sporadic cases/outbreaks associated with food and travel, as well as sustained transmission among sexual networks of men who have sex with men (MSM). Whole-genome sequencing (WGS) now underpins national shigellosis surveillance in the United Kingdom. Hence, consistent, communicable genotyping is essential for case linkage and trend detection across heterogeneous transmission modes. Here, we evaluate the performance of WGS genotyping approaches for granulating outbreaks of S. sonnei shigellosis, particularly considering differential performance in dense sexual transmission where highly clonal MSM-associated sublineages pose distinct clustering challenges. Specifically, we compare performance of the current practice approach (10 SNP-distance clustering based on SNP address [t10]), allele-based methods (EnteroBase cgMLST/HierCC [HC5]), a pathogen-specific genotyping scheme (sonneityper), and two k-mer based approaches (PopPUNK and KPop), on a bona fide UK surveillance dataset (n = 3,639 isolates from between 2016 and 2022), and stratify analyses by demographics (i.e. presumptive MSM [pMSM] versus non-pMSM). Comparison metrics indicate that t10 clustering method groups data more broadly than HC5, and k-mer-based methods may capture genetic variation independent from SNP or allele-based approaches. Clusters derived from k-mer-based methods offer similar resolution to HC5 and reflect different demographics, but had unconvincing utility for this pathogen. These findings suggest a transmission context-aware surveillance workflow for shigellosis in high income settings: anchor routine communication on a portable allele-based backbone and augment with more granular, complementary methods (e.g., k-mer-based micro-partitioning or phylogenetic analysis) in comparatively low genomic-density regions of population structure (e.g., pMSM transmission lineages) to stabilise clusters and reduce artefactual chaining.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 20%
9.8%
2
Journal of Clinical Microbiology
120 papers in training set
Top 0.3%
9.8%
3
The Lancet Microbe
43 papers in training set
Top 0.1%
8.9%
4
eLife
5422 papers in training set
Top 9%
8.2%
5
The Journal of Infectious Diseases
182 papers in training set
Top 0.6%
6.2%
6
Genome Medicine
154 papers in training set
Top 1%
4.7%
7
Clinical Infectious Diseases
231 papers in training set
Top 1%
4.7%
50% of probability mass above
8
Microbial Genomics
204 papers in training set
Top 0.6%
3.9%
9
Nature Microbiology
133 papers in training set
Top 1%
3.6%
10
Emerging Infectious Diseases
103 papers in training set
Top 0.6%
3.5%
11
Journal of Infection
71 papers in training set
Top 0.7%
2.5%
12
Scientific Reports
3102 papers in training set
Top 54%
1.8%
13
Cell Reports Medicine
140 papers in training set
Top 3%
1.8%
14
mBio
750 papers in training set
Top 8%
1.7%
15
Wellcome Open Research
57 papers in training set
Top 1.0%
1.6%
16
Nature Medicine
117 papers in training set
Top 2%
1.6%
17
mSphere
281 papers in training set
Top 4%
1.4%
18
mSystems
361 papers in training set
Top 6%
1.3%
19
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.3%
20
PLOS ONE
4510 papers in training set
Top 59%
1.3%
21
Journal of The Royal Society Interface
189 papers in training set
Top 3%
1.2%
22
BMC Medicine
163 papers in training set
Top 5%
1.2%
23
PLOS Neglected Tropical Diseases
378 papers in training set
Top 4%
0.9%
24
BMC Genomics
328 papers in training set
Top 5%
0.9%
25
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
26
The Lancet Infectious Diseases
71 papers in training set
Top 3%
0.8%
27
The Lancet Regional Health - Western Pacific
15 papers in training set
Top 0.3%
0.7%
28
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%
29
Eurosurveillance
80 papers in training set
Top 2%
0.6%
30
Communications Biology
886 papers in training set
Top 30%
0.6%