Back

Population analysis and host-disease associations of Shiga toxin-producing Escherichia coli from various sources across eleven European countries using whole genome sequencing

Tozzoli, R.; Schadron, T.; Knijn, A.; De Sabato, L.; Morabito, S.; Montalbano Di Filippo, M.; Fiskebeck, E.; Johannessen, G.; Antony-Samy, J. K.; Good, L.; Soderlund, R.; van Hoek, A.; Mughini Gras, L.; Franz, E.; Wieczorek, K.; Scavia, G.; Moro, O.; Chiani, P.; Michelacci, V.; Burgess, C. M.; Duffy, G.; Rodgers, J.; Kirchner, M.; Pista, A.; Silveira, L.; Amaro, A.; Clemente, L.; Chattaway, M. A.; Jenkins, C.; Dallman, T.; Schjorring, S.; Scheutz, F.; Byrne, B.; Gutierrez, M.; Lopez-Chavarrias, V.; Ugarte-Ruiz, M.; Brandal, L.; Naseer, U.; Kolackova, I.; Zomer, A. L.; Wagenaar, J. A.; Pires, S

2026-04-28 genomics
10.64898/2026.04.27.721056 bioRxiv
Show abstract

Shiga toxin-producing Escherichia coli (STEC) are important foodborne pathogens, able to cause severe disease in humans. In the DiSCoVeR project (https://onehealthejp.eu/jrp-discover/) a STEC inventory from human and non-human sources from 11 European countries was set up and [≥] 3500 strains were sequenced to perform comparative genomics analysis. We used this dataset to assess STEC population structure and to investigate potential associations between genomic features, host reservoirs and symptoms. Most STEC isolates analysed by Whole Genome Sequencing (WGS) in this study were collected between years 2010-2020. An ad hoc pipeline was deployed for a harmonised characterization of the STEC in the database, allowing the determination of serotyping, stx gene subtyping, 7-loci MLST, virulotyping and cgMLST. The results were analysed with Principal Component Analysis (PCoA) in relation with isolation source to assess clustering of STEC subpopulations. When human STEC data were analysed, the PCoA revealed three distinct human STEC subpopulations (STEC_1, STEC_2 and STEC_3), which were further analysed for associations between genomic features, symptoms and variance. The non-human STEC showed a more dispersed distribution, except for one subpopulation with genes linked to specific host species, and some virulence profiles overlapping with the STEC_1 population. In conclusion, our analysis identified distinct STEC subpopulations from human cases, each characterized by specific genetic features and associated with varying proportions of severe disease outcomes. These findings provide novel insights supporting the risk assessment of STEC. Impact statement[This lay summary of your article should be no more than 200 words, and should a) provide a perspective of how this article adds to the literature in the field; b) identify breadth of interest/utility; and c) state the significance of output (incremental or step), in terms of relevance.] This study is based on the establishment of a One Health STEC genomes database, including sequences from isolates of different sources. Most of the isolates had been isolated in the ten-years time span 2010-2020, in 11 different countries, for surveillance and monitoring activities or specific surveys and research purposes. The final dataset included the whole genome sequencing of 3,418 STEC isolates, mainly from human cases of infections. The metadata included the host symptoms, where available, for human STEC strains and the animal source the strains had been isolated from. We set up a pipeline for the harmonized analysis of STEC WGS, called Discover, made available though ARIES webserver or GitHub. The analysis allowed a deep characterization of STEC strains circulating in Europe. We used this resource to assess STEC population structure and to investigate potential associations between genomic features, host reservoirs, and various symptoms associated with STEC infection by PCoA. This analysis highlighted the presence of subpopulation of human STEC associated with specific features. We provide new information useful for risk characterization, as well as a large dataset genome database and associated metadata compiled from STEC strains, representing a valuable resource for the scientific community, enabling further investigations into STEC diversity, evolution, source attribution and public health relevance. Data summaryThe authors confirm all supporting data, including sequence data accession numbers, code and protocols have been provided within the article or through supplementary data files. One supplementary method and five supplementary tables are available with the online version of this article

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Microbial Genomics
204 papers in training set
Top 0.1%
32.9%
2
PLOS Neglected Tropical Diseases
378 papers in training set
Top 0.8%
12.4%
3
Scientific Reports
3102 papers in training set
Top 24%
4.8%
50% of probability mass above
4
Genomics
60 papers in training set
Top 0.2%
3.9%
5
Genome Medicine
154 papers in training set
Top 2%
3.6%
6
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 1%
3.6%
7
Frontiers in Genetics
197 papers in training set
Top 3%
2.9%
8
BMC Biology
248 papers in training set
Top 0.9%
1.9%
9
Microorganisms
101 papers in training set
Top 0.6%
1.9%
10
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
11
International Journal of Food Microbiology
11 papers in training set
Top 0.2%
1.7%
12
Nature Communications
4913 papers in training set
Top 53%
1.7%
13
mSystems
361 papers in training set
Top 5%
1.7%
14
Epidemiology and Infection
84 papers in training set
Top 2%
1.7%
15
Peer Community Journal
254 papers in training set
Top 2%
1.5%
16
PLOS ONE
4510 papers in training set
Top 60%
1.2%
17
Viruses
318 papers in training set
Top 4%
0.9%
18
Transboundary and Emerging Diseases
34 papers in training set
Top 0.6%
0.9%
19
BMC Genomics
328 papers in training set
Top 4%
0.9%
20
Frontiers in Microbiology
375 papers in training set
Top 9%
0.7%
21
International Journal of Infectious Diseases
126 papers in training set
Top 3%
0.7%
22
GigaScience
172 papers in training set
Top 3%
0.7%
23
The Journal of Infectious Diseases
182 papers in training set
Top 5%
0.7%
24
Evolutionary Applications
91 papers in training set
Top 1%
0.6%
25
Animal Microbiome
26 papers in training set
Top 0.4%
0.6%