Back

Epidemiology of Legionella: Genome-bAsed Typing (el_gato) - a new bioinformatic tool for identifying sequence-based types of Legionella pneumophila from whole genome sequencing data

Collins, A. J.; Mashruwala, D.; Chivukula, V.; Kozak-Muiznieks, N. A.; Rishishwar, L.; Norris, E. T.; Willby, M. J.; Hamlin, J.; Overholt, W. A.

2026-03-23 bioinformatics
10.64898/2026.03.20.713011 bioRxiv
Show abstract

Sequence-based typing (SBT) via Sanger sequencing has been the standard for describing Legionella pneumophila relationships for two decades. SBT involves sequencing seven loci, identifying alleles using the United Kingdom Health Security Agency (UKHSA) database, and inferring the corresponding sequence type (ST). While similar SBT approaches for other organisms can be easily adapted to whole genome sequencing (WGS), L. pneumophila presents two known challenges for this adaptation: multiple copies of one locus (mompS) and extensive heterogeneity in a second locus (neuA/neuAh). Although several computational methods have been proposed to address these issues, a WGS-based replacement with equal resolution to traditional SBT has been elusive. To address this gap, we developed el_gato (Epidemiology of Legionella: Genome-bAsed Typing; https://github.com/CDCgov/el_gato), which offers several advantages over existing methods: (1) a novel approach for resolving multiple mompS alleles identified in the same isolate, (2) the ability to capture diverse neuA/neuAh alleles, (3) fast runtime with an average of 27.7 seconds per sample, (4) easy installation via Bioconda or Docker and (5) an updated database as of March 2025. el_gato works with either paired-end short reads or genome assemblies, performing more accurately with paired-end short reads at least 250 base pairs (bp) in length. We compared el_gato against two other in silico SBT tools ("mompS", hereafter referred to as mompS tool and "legsta") using a dataset of 441 isolates with sequence types (STs) previously determined by Sanger-based sequencing. el_gato correctly identified the ST for 98.9% of the test isolates, compared to 95.2% for the mompS tool and 42.2% for legsta, demonstrating a significant improvement compared to the mompS tool (adjusted p = 1.06e-3) and legsta (adjusted p = 4.24e-55) in ST identification. Furthermore, el_gatos determination of ST was not significantly different from Sanger sequencing (adjusted p = 0.442). In summary, el_gato significantly improves in silico SBT and given its growing adoption, is poised to support the public health community.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Microbial Genomics
204 papers in training set
Top 0.1%
37.5%
2
Journal of Clinical Microbiology
120 papers in training set
Top 0.2%
14.6%
50% of probability mass above
3
Genome Medicine
154 papers in training set
Top 0.6%
8.3%
4
Scientific Reports
3102 papers in training set
Top 31%
3.9%
5
The Lancet Microbe
43 papers in training set
Top 0.2%
3.6%
6
PLOS ONE
4510 papers in training set
Top 40%
3.6%
7
mSphere
281 papers in training set
Top 2%
3.6%
8
PLOS Neglected Tropical Diseases
378 papers in training set
Top 3%
2.1%
9
Microbiology Spectrum
435 papers in training set
Top 2%
2.1%
10
Genome Research
409 papers in training set
Top 2%
1.7%
11
Nature Communications
4913 papers in training set
Top 55%
1.3%
12
Bioinformatics
1061 papers in training set
Top 8%
1.2%
13
BMC Genomics
328 papers in training set
Top 4%
0.9%
14
The Journal of Infectious Diseases
182 papers in training set
Top 4%
0.9%
15
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
16
Clinical Infectious Diseases
231 papers in training set
Top 4%
0.8%
17
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 5%
0.8%
18
Microbiology Resource Announcements
22 papers in training set
Top 0.9%
0.7%
19
Peer Community Journal
254 papers in training set
Top 4%
0.7%
20
mSystems
361 papers in training set
Top 8%
0.7%
21
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%
22
Frontiers in Microbiology
375 papers in training set
Top 10%
0.6%