On the relationship between protist metabarcoding and protist metagenome-assembled genomes

Zavadska, D.; Henry, N.; Auladell, A.; Berney, C.; Richter, D. J.

2023-10-10 bioinformatics

10.1101/2023.10.09.561583 bioRxiv

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWThe two most commonly used approaches to study the composition of environmental protist communities are marker gene metabarcoding and whole genome analysis through metagenomics. Raw metabarcoding data are usually processed into Operational Taxonomic Units (OTUs) or amplicon sequence variants (ASVs) through clustering or denoising approaches, respectively. Analogous approaches have been developed to assemble metagenomic sequence reads into metagenome-assembled genomes (MAGs). Understanding the correspondence between the data produced by these two approaches can help to integrate information between the datasets and to explain how metabarcoding OTUs and MAGs are related with the underlying biological entities they are hypothesised to represent. Due to the nature of their construction, MAGs do not contain the most commonly used barcoding loci, meaning that sequence homology approaches cannot be used to match OTUs and MAGs. We made an attempt to match V9 metabarcoding OTUs from the 18S rRNA gene (V9 OTUs) and MAGs from the Tara Oceans expedition (2009-2013) based on the correspondence of their relative abundances across the same set of samples. We evaluated the performance of several methods for detecting correspondence between features in these two compositional datasets and developed a series of controls to filter artefacts of data structure and processing. After selecting the best-performing correspondence metrics, ranking the V9 OTU/MAG matches by their proportionality/correlation coefficients and applying a set of selection criteria, we identified candidate matches between V9 OTUs and MAGs. In a subset of cases, V9 OTUs and MAGs could be successfully matched with one another with a one-to-one correspondence, implying that they likely represent the same underlying biological entity. More generally, matches we observed could be classified into 4 scenarios: Scenario I - one V9 OTU matches more than one MAG; Scenario II - more than one V9 OTU matches more than one MAG; Scenario III - more than one V9 OTU matches one MAG; Scenario IV - one V9 OTU matches one MAG. These diverse scenarios for V9 OTU-MAG matches illustrate the complex nature of the OTU/MAG relationship. Notably, we found some instances in which different OTU-MAG matches from the same taxonomic group were not classified in the same scenario, with all four scenarios possible even within the same taxonomic group, illustrating that factors beyond taxonomic lineage influence the relationship between OTUs and MAGs. Overall, each scenario produces a different interpretation of V9 OTUs, MAGs and how they compare in terms of the genomic and ecological diversity that they represent.

On the relationship between protist metabarcoding and protist metagenome-assembled genomes

Matching journals