Taxonomic profilers and their influence on metagenomic diversity analyses
Rondeau-Leclaire, J.; Blanchet, G.; Jacques, P.-E.; Laforest-Lapointe, I.
Show abstract
Estimating taxonomic profiles is a central task in microbiome research. Several bioinformatic tools have been developed for this purpose, differing in algorithmic strategy, reference database flexibility, sensitivity parameters, and the type of abundance they estimate. As a result, taxonomic profiles carry an unwanted methodological signal whose driving characteristics remains understudied. While benchmarks have evaluated the performance of some of these tools, they rely on simulated data; little work has been done to compare them using real metagenomes in the presence of noise and uncharacterised diversity. Overall, the impact of taxonomic profiler choice and parameterisation on scientific conclusions remains poorly understood. Here, we provide a much-needed characterisation of four taxonomic profilers to help researchers better understand the available bioinformatic tools and inform their methodological choices. Then, we leverage 1,211 shotgun metagenomes from eight datasets to compare these taxonomic profilers across 13 methodological designs. Based on diversity indices, we found substantial variability in estimated taxonomic composition depending on methodological features such as reference database and algorithmic strategy. Alpha diversity and its analysis varied substantially with tool choice (particularly among k-mer-based tools) and with reference database. Beta diversity showed sensitivity to both database and parameter choices, yet this variability barely affected statistical inference. This work raises awareness about the causes of variability in metagenome analysis attributable to choices in taxonomic profiling methodology. Our findings highlight the sensitivity of taxonomic diversity analyses to these choices and the importance for researchers to consider assessing the robustness of their results to choice of tool, parameter, and reference database. Crucially, differences in sample diversity across methodologies are symptomatic of differences in estimated taxonomic composition, which can affect any analysis based on taxonomic abundances. Overall, this study underscores the importance of tool selection and parametrisation, and of conducting sensitivity analyses to support robust and reliable scientific conclusions.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.