Back

Towards a holistic epidemiology of Streptococcus agalactiae using the BakRep repository

Fenske, L.; Schwengers, O.; Goesmann, A.

2026-03-03 genomics
10.64898/2026.03.02.709001 bioRxiv
Show abstract

Streptococcus agalactiae is a versatile multi-host pathogen that can cause major neonatal disease in humans, as well as mastitis in dairy animals. Its ability to infect a wide range of hosts is largely driven by its high genomic plasticity and the acquisition of distinct accessory genes. The global population of S. agalactiae is characterized by multiple of capsular serotypes and clonal complexes that differ in their propensity to cause invasive disease, including hypervirulent CC17 (often serotype III) associated with neonatal meningitis, whereas CC1/CC19/CC23 are more often colonizing lineages. Although widely studied, most research is limited to particular regions or single outbreak events, offering only fragmented snapshots instead of a comprehensive global picture. To move beyond region- or outbreak-limited studies, this work has analyzed 37970 S.agalactiae genomes from BakRep, integrating serotypes, MLST, AMR genes, lineage-specific genes, and descriptive metadata to map current trends and identify potential gaps in public data. The dataset largely matched the known population structure with serotype III, Ia and V most common and stable serotype/clonal complex lineages (e.g. III-2/CC17, Ia/CC23, CC1/V), while also rising serotype diversity. Lineages differed in their accessory-gene profiles, with III-2/CC17 being enriched for virulence and adhesion genes, while other groups showed either greater genomic plasticity (mobile/phage genes) or niche specialization. AMR was widespread with very high tetracycline resistance (>80%), frequent MLSB resistance determinants, and emerging aminoglycoside resistance in some genomes. But overall it became evident that the associated metadata contained substantial gaps. Missing or incomplete information limits biological interpretation, underscoring that rigorously curated, structured metadata is essential for maximizing the value of ongoing sequencing efforts.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Microbial Genomics
204 papers in training set
Top 0.1%
38.0%
2
mSystems
361 papers in training set
Top 0.8%
9.2%
3
Nature Communications
4913 papers in training set
Top 26%
6.9%
50% of probability mass above
4
Genome Medicine
154 papers in training set
Top 1%
4.9%
5
Frontiers in Microbiology
375 papers in training set
Top 2%
4.0%
6
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 1%
3.7%
7
Scientific Reports
3102 papers in training set
Top 42%
2.9%
8
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
9
Peer Community Journal
254 papers in training set
Top 2%
1.5%
10
BMC Genomics
328 papers in training set
Top 3%
1.5%
11
Genomics
60 papers in training set
Top 2%
1.2%
12
eLife
5422 papers in training set
Top 53%
0.9%
13
mBio
750 papers in training set
Top 11%
0.8%
14
Microbiome
139 papers in training set
Top 3%
0.8%
15
Communications Biology
886 papers in training set
Top 23%
0.8%
16
PLOS ONE
4510 papers in training set
Top 68%
0.8%
17
Frontiers in Genetics
197 papers in training set
Top 10%
0.8%
18
Animal Microbiome
26 papers in training set
Top 0.3%
0.8%
19
Microbiology Spectrum
435 papers in training set
Top 6%
0.7%
20
Evolutionary Applications
91 papers in training set
Top 1%
0.7%
21
Nature Microbiology
133 papers in training set
Top 5%
0.7%
22
International Journal of Food Microbiology
11 papers in training set
Top 0.7%
0.6%
23
BMC Biology
248 papers in training set
Top 6%
0.6%
24
Microorganisms
101 papers in training set
Top 3%
0.5%
25
Antibiotics
32 papers in training set
Top 2%
0.5%
26
PLOS Computational Biology
1633 papers in training set
Top 29%
0.5%
27
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 7%
0.5%
28
GigaScience
172 papers in training set
Top 4%
0.5%
29
Epidemiology and Infection
84 papers in training set
Top 4%
0.5%