Back

Taxonomic profilers and their influence on metagenomic diversity analyses

Rondeau-Leclaire, J.; Blanchet, G.; Jacques, P.-E.; Laforest-Lapointe, I.

2026-05-30 bioinformatics
10.64898/2026.05.27.727884 bioRxiv
Show abstract

Estimating taxonomic profiles is a central task in microbiome research. Several bioinformatic tools have been developed for this purpose, differing in algorithmic strategy, reference database flexibility, sensitivity parameters, and the type of abundance they estimate. As a result, taxonomic profiles carry an unwanted methodological signal whose driving characteristics remains understudied. While benchmarks have evaluated the performance of some of these tools, they rely on simulated data; little work has been done to compare them using real metagenomes in the presence of noise and uncharacterised diversity. Overall, the impact of taxonomic profiler choice and parameterisation on scientific conclusions remains poorly understood. Here, we provide a much-needed characterisation of four taxonomic profilers to help researchers better understand the available bioinformatic tools and inform their methodological choices. Then, we leverage 1,211 shotgun metagenomes from eight datasets to compare these taxonomic profilers across 13 methodological designs. Based on diversity indices, we found substantial variability in estimated taxonomic composition depending on methodological features such as reference database and algorithmic strategy. Alpha diversity and its analysis varied substantially with tool choice (particularly among k-mer-based tools) and with reference database. Beta diversity showed sensitivity to both database and parameter choices, yet this variability barely affected statistical inference. This work raises awareness about the causes of variability in metagenome analysis attributable to choices in taxonomic profiling methodology. Our findings highlight the sensitivity of taxonomic diversity analyses to these choices and the importance for researchers to consider assessing the robustness of their results to choice of tool, parameter, and reference database. Crucially, differences in sample diversity across methodologies are symptomatic of differences in estimated taxonomic composition, which can affect any analysis based on taxonomic abundances. Overall, this study underscores the importance of tool selection and parametrisation, and of conducting sensitivity analyses to support robust and reliable scientific conclusions.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 2%
14.0%
2
mSystems
361 papers in training set
Top 0.5%
12.0%
3
PeerJ
261 papers in training set
Top 0.4%
7.0%
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.2%
5
Microbiome
139 papers in training set
Top 0.6%
6.2%
6
PLOS ONE
4510 papers in training set
Top 32%
4.7%
50% of probability mass above
7
Microbial Genomics
204 papers in training set
Top 0.5%
4.2%
8
Frontiers in Microbiology
375 papers in training set
Top 3%
3.5%
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
10
Scientific Reports
3102 papers in training set
Top 47%
2.4%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
12
Bioinformatics
1061 papers in training set
Top 7%
2.0%
13
BMC Genomics
328 papers in training set
Top 2%
1.8%
14
mSphere
281 papers in training set
Top 3%
1.7%
15
GigaScience
172 papers in training set
Top 1%
1.7%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
17
Environmental Microbiome
26 papers in training set
Top 0.2%
1.6%
18
Frontiers in Bioinformatics
45 papers in training set
Top 0.3%
1.4%
19
Molecular Ecology Resources
161 papers in training set
Top 0.7%
1.4%
20
Peer Community Journal
254 papers in training set
Top 2%
1.4%
21
Microorganisms
101 papers in training set
Top 1.0%
1.4%
22
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.3%
23
F1000Research
79 papers in training set
Top 3%
0.9%
24
BMC Microbiology
35 papers in training set
Top 1%
0.8%
25
Microbiology Spectrum
435 papers in training set
Top 6%
0.7%
26
Journal of Proteome Research
215 papers in training set
Top 3%
0.6%