Phylogenomic Taxonomic Analysis of Ralstonia solanacearum Strains causing Bacterial Wilt Disease in Northeastern Argentina.
Obregon, V.; Shin, G. Y.; Galdeano, E.; Escobar, R.; Lattar, T.; Ibanez, J. M.; Amadio, A.; Irazoqui, J. M.; Santiago, G. M.; Eberhardt, M. F.; Gochez, A. M.; Lowe-Power, T.
Show abstract
Ralstonia solanacearum species complex (RSSC) is a genetically diverse group of plant pathogens, yet genomic data from South America remain limited. Here, we characterize 13 RSSC strains isolated from tomato, pepper, and eggplant in northeastern Argentina. Phylogenetic analysis of the egl marker gene assigned these strains to phylotype IIA and suggested two closely related lineages. Complete genomes (5.63-5.76 Mb) were generated for four representative strains, yielding high-quality (99.94% completeness with f_Burkholderiaceae CheckM markers), closed assemblies with canonical bipartite architecture. Phylogenetic analysis of the egl marker, 49 conserved bacterial genes, and average nucleotide identity (ANI) analyses, consistently assigned one lineage to sequevar IIA-50, forming a coherent and monophyletic group. In contrast, although egl analysis suggested the second lineage was related to one sequevar IIA-38 reference strain, genomic analysis did not support this assignment. Further, the genomic analysis revealed significant genomic distance between the genomes for two sequevar 38 representative strains, supporting a conclusion that sequevar 38 itself was not monophyletic and instead appears paraphyletic. These findings highlight limitations of single-locus classification and support genome-informed refinement of RSSC sub-phylotype taxonomy. Outcome statementReports of bacterial wilt disease in Argentina had not yet been published in the international literature although the disease has been long-standing. This study provides complete genome sequences for four Ralstonia solanacearum strains from Northern Argentina and places them within a global phylogenomic framework. The Argentine strains cluster into two closely related phylotype IIA lineages, indicating that bacterial wilt in this regional dataset is associated with genetically similar populations. For clear communication of which strains are present in Northern Argentina, we attempted to classify the lineages to the long-standing sequence variant (sequevar) system for naming R. solanacearum species complex (RSSC) strains. One lineage was confidently assigned to IIA-50 with genomic support that confirmed phylogenetic analysis of the classical genetic marker egl. However, newly available genomes for sequevar reference strains revealed an issue where two distantly related strains are currently recognized as references for sequevars. Overall, these results provide evidence supporting the need for genome-informed refinement of sub-phylotype classification and expand genomic representation of South American RSSC populations. Data summaryComplete genome assemblies and raw reads for INTABV18, INTABV29, INTABV624 and INTABV2657 are deposited to NCBI under the project number PRJNA1407867. The curated dataset of public RSSC genomes is available to users who register a free account on KBase via a KBase narrative (https://narrative.kbase.us/narrative/189849). The narrative described in a living BioRxiv pre-print [1]. Supplemental files such as Figure S1, rectangular versions of all trees (Figure 2 and 3 and S1) and supplementary table S1, S2, S3 and S4 are available on Zenodo at doi.org/10.5281/zenodo.19502890 O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=172 SRC="FIGDIR/small/721750v1_figS1.gif" ALT="Figure 1"> View larger version (47K): org.highwire.dtl.DTLVardef@1ac3168org.highwire.dtl.DTLVardef@1dfd0d6org.highwire.dtl.DTLVardef@107ae42org.highwire.dtl.DTLVardef@141937c_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure S1.C_FLOATNO Maximum-likelihood phylogenetic tree inferred from 471 bp of endoglucanase (egl) gene sequences assigned Argentine strains as phylotype II sequevar 38 and sequevar 50. The tree was constructed using PhyML v3.0 under the GTR nucleotide substitution model with gamma-distributed rate heterogeneity ( = 0.33), as selected by the SMART model selection procedure implemented in PhyML (Lefort et al., 2017). The egl sequences from Argentine strains are highlighted in blue, and their corresponding GenBank accession numbers for both the egl nucleotide sequence and the whole-genome assembly are shown in parentheses. Reference egl sequences representing sequevars IIA-38 (CFBP6801 and CIP120) and IIA-50 (T1-UY and ACH1076) are also shown in bold and marked with yellow circles. A searchable PDF of this tree in rectangular format is available on Zenodo (doi.org/10.5281/zenodo.19502890). C_FIG O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=196 SRC="FIGDIR/small/721750v1_fig2.gif" ALT="Figure 2"> View larger version (53K): org.highwire.dtl.DTLVardef@39d776org.highwire.dtl.DTLVardef@170bd89org.highwire.dtl.DTLVardef@aba166org.highwire.dtl.DTLVardef@1f156dd_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 2.C_FLOATNO Maximum-likelihood phylogenetic tree inferred from 710 bp of endoglucanase (egl) gene sequences assigned Argentine strains as phylotype II sequevar 38 and sequevar 50. The phylogenetic tree was constructed using PhyML v3.0 under the GTR+R nucleotide substitution model, as selected by the SMART model selection procedure (Lefort et al., 2017). egl sequences from four Argentine strains (INTABV18, INTABV29, INTABV624, and INTABV2657) are shown in bold and highlighted in blue. Reference egl sequences representing sequevars IIA-38 (CFBP6801 and CIP120) and IIA-50 (T1-UY and ACH1076) are also shown in bold and marked with yellow circles. Two USA strains identified as IIA-38 (UCD576 and RS124) are shown in bold. A searchable PDF of this tree in rectangular format is available on Zenodo (doi.org/10.5281/zenodo.19502890). C_FIG O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=116 SRC="FIGDIR/small/721750v1_fig3.gif" ALT="Figure 3"> View larger version (37K): org.highwire.dtl.DTLVardef@17dd372org.highwire.dtl.DTLVardef@1c5156corg.highwire.dtl.DTLVardef@179d9org.highwire.dtl.DTLVardef@e6d529_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 3.C_FLOATNO Approximate maximum-likelihood phylogeny based on a concatenated alignment of 49 conserved genes places four Argentine genomes (INTABV18, INTABV29, INTABV624 and INTABV2657) within the phylotype IIA clade. The tree was constructed using the SpeciesTreeBuilder v0.1.4 application on the KBase platform, incorporating the four Argentine genomes into a reference dataset of 825 genomes representing the known global diversity of the RSSC. The tree was visualized and annotated using iTOL v7.4.2. Argentine genomes are shown in bold and highlighted in blue, and egl reference strains for the sequevar IIA-38 (CIP120 and CFBP6801) and IIA-50 (T1-UY) are shown in bold and marked with yellow circles. Branches with approximate likelihood-ratio support values higher than >70% are colored in blue. A searchable PDF of this tree in rectangular format is available on Zenodo (doi.org/10.5281/zenodo.19502890). C_FIG
Matching journals
The top 12 journals account for 50% of the predicted probability mass.