Back

SCCmecExtractor: A tool for extracting Staphylococcal Cassette Chromosome elements from Whole Genome Sequences

MacFadyen, A. C.

2026-03-31 microbiology
10.64898/2026.03.31.715619 bioRxiv
Show abstract

Staphylococcal cassette chromosome (SCC) elements are mobile genetic elements that integrate at the rlmH gene and are predominantly responsible for methicillin resistance in staphylococci. Although SCCmec typing tools exist, none can extract the element sequence itself or explicitly classify SCC elements that lack methicillin resistance genes. Here we present SCCmecExtractor, a lightweight Python toolkit that identifies SCC element boundaries through degenerate attachment site (att) pattern matching, extracts complete elements from whole-genome assemblies and characterises their mec and ccr gene content. Benchmarking on 7,297 genomes spanning 70 species across Staphylococcus and Mammaliicoccus demonstrated 100% typing concordance with the sccmec tool1 on 1,454 S. aureus genomes. The tool extracted 1,562 SCC elements, from 1,454 S. aureus, 5,295 non-aureus Staphylococcus and 548 Mammaliicoccus genomes, achieving effective extraction rates (excluding assembly-limited genomes and those lacking valid ccr pairs) of 87.3% for S. aureus, 58.8% for non-aureus Staphylococcus, and 61.9% for Mammaliicoccus. Notably, 616 of the 1,562 extracted elements (39.4%) were non-mec SCC elements lacking methicillin resistance genes, a class of mobile element often overlooked. Non-mec SCC prevalence increased from 12.2% in S. aureus to 55.6% in non-aureus Staphylococcus and 76.0% in Mammaliicoccus, revealing a substantial reservoir of SCC diversity beyond methicillin resistance. SCCmecExtractor is freely available via PyPI, Docker and Singularity under an MIT licence. Impact StatementStaphylococcal cassette chromosome (SCC) elements are mobile genetic elements responsible for methicillin resistance in staphylococci and are central to methicillin resistant Staphylococcus aureus (MRSA) epidemiology. Existing tools focus on typing SCCmec from assemblies but cannot extract the element itself, limiting our ability to comprehensively monitor and examine these elements. SCCmecExtractor is a lightweight, portable tool that detects the attachment sites, required by SCC elements to integrate into the genome, extracts the SCC element, both mec gene carrying and not, and characterises their gene content. Applied across 7,297 genomes spanning two genera, we demonstrate that non-mec SCC elements are the dominant SCC class outside S. aureus, a finding enabled by systematic extraction and classification of SCC elements regardless of mec gene content. SCCmecExtractor provides the research community with an accessible, confidence-first approach (based on biology) to SCC element analysis across all staphylococci and mammaliicocci species. Data SummaryThe code for this pipeline is available at: https://github.com/AlisonMacFadyen/SCCmecExtractor, with a Docker image available at: https://hub.docker.com/repository/docker/alisonmacfadyen/sccmecextractor and PyPi package at: https://pypi.org/project/sccmecextractor/. All reference databases are bundled with the tool. Benchmarking genome accessions: 1,454 S. aureus, 5,295 non-aureus Staphylococcus, and 548 Mammaliicoccus genomes from NCBI. A complete list of genome accessions is provided as supplementary data (Supplementary Table S1). Extracted SCC elements can be obtained from Zenodo: 10.5281/zenodo.19355206

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Microbial Genomics
204 papers in training set
Top 0.1%
18.7%
2
Nature Communications
4913 papers in training set
Top 9%
14.8%
3
Genome Medicine
154 papers in training set
Top 0.3%
12.6%
4
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 1%
4.0%
50% of probability mass above
5
Bioinformatics
1061 papers in training set
Top 5%
3.6%
6
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
7
mSphere
281 papers in training set
Top 2%
2.8%
8
Nature Microbiology
133 papers in training set
Top 2%
2.1%
9
Genome Biology
555 papers in training set
Top 3%
2.1%
10
GigaScience
172 papers in training set
Top 1%
1.8%
11
Genome Research
409 papers in training set
Top 2%
1.8%
12
Scientific Reports
3102 papers in training set
Top 55%
1.8%
13
mSystems
361 papers in training set
Top 5%
1.7%
14
eLife
5422 papers in training set
Top 45%
1.5%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.3%
16
BMC Genomics
328 papers in training set
Top 3%
1.3%
17
PLOS Biology
408 papers in training set
Top 13%
1.2%
18
PLOS Computational Biology
1633 papers in training set
Top 19%
1.2%
19
mBio
750 papers in training set
Top 9%
1.2%
20
Microbiome
139 papers in training set
Top 3%
0.9%
21
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
22
The Lancet Infectious Diseases
71 papers in training set
Top 3%
0.8%
23
Access Microbiology
22 papers in training set
Top 0.6%
0.8%
24
Nature
575 papers in training set
Top 15%
0.8%
25
The Lancet Microbe
43 papers in training set
Top 1%
0.8%
26
PLOS ONE
4510 papers in training set
Top 69%
0.7%
27
Journal of Clinical Microbiology
120 papers in training set
Top 2%
0.7%
28
Frontiers in Microbiology
375 papers in training set
Top 10%
0.6%
29
Microbiology Spectrum
435 papers in training set
Top 6%
0.6%
30
Microbiology
57 papers in training set
Top 2%
0.5%