Ratatoskr: A tool for automated retrieval of taxonomic type strain sequences and metadata
Turkington, C.; Bastiaanssen, F.; Nezam-Abadi, N.; Shkoporov, A. N.; Hill, C.
Show abstract
Bacterial taxonomic type strains anchor species names to physical and genomic reference material, making them essential for reproducible and comparable prokaryotic research. While reference strains are often well-characterised through curated metadata, nomenclature histories, and sequence records, no single database holds up-to-date information on all these aspects, resulting in fragmented information. Gathering the complete set of information for a type strain is further complicated by inconsistencies in nomenclature between sources due to the often-numerous synonyms that can describe a single strain. As a result, collecting type strain data for taxonomic proposals and emendations can be an onerous task requiring extensive manual curation. To address this issue, we introduce Ratatoskr, a Python-based tool that automates the retrieval of sequences and metadata for bacterial taxonomic type strains. Ratatoskr facilitates this by collecting the latest type strain information of the List of Prokaryotic names with Standing in Nomenclature (LPSN) and using this information to query the BacDive and NCBI databases. By applying known taxonomic synonym information Ratatoskr is able to resolve cross-database inconsistencies and streamline the retrieval process. We show that through its use, Ratatoskr can obtain metadata and sequence data for type strains of bacteria within minutes to seconds, depending on the number of members within the requested taxon. By automating this retrieval, Ratatoskr provides fast, accurate, and readily shareable starting points for studies involving the use of taxonomic type strains and data, such as new taxonomic proposals or emendations. Data summaryRatatoskr was developed using Python 3 and is freely available at https://github.com/Fabian-Bastiaanssen/Ratatoskr under a GPL-3.0 licence.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.