Back

On the relationship between protist metabarcoding and protist metagenome-assembled genomes

Zavadska, D.; Henry, N.; Auladell, A.; Berney, C.; Richter, D. J.

2023-10-10 bioinformatics
10.1101/2023.10.09.561583 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWThe two most commonly used approaches to study the composition of environmental protist communities are marker gene metabarcoding and whole genome analysis through metagenomics. Raw metabarcoding data are usually processed into Operational Taxonomic Units (OTUs) or amplicon sequence variants (ASVs) through clustering or denoising approaches, respectively. Analogous approaches have been developed to assemble metagenomic sequence reads into metagenome-assembled genomes (MAGs). Understanding the correspondence between the data produced by these two approaches can help to integrate information between the datasets and to explain how metabarcoding OTUs and MAGs are related with the underlying biological entities they are hypothesised to represent. Due to the nature of their construction, MAGs do not contain the most commonly used barcoding loci, meaning that sequence homology approaches cannot be used to match OTUs and MAGs. We made an attempt to match V9 metabarcoding OTUs from the 18S rRNA gene (V9 OTUs) and MAGs from the Tara Oceans expedition (2009-2013) based on the correspondence of their relative abundances across the same set of samples. We evaluated the performance of several methods for detecting correspondence between features in these two compositional datasets and developed a series of controls to filter artefacts of data structure and processing. After selecting the best-performing correspondence metrics, ranking the V9 OTU/MAG matches by their proportionality/correlation coefficients and applying a set of selection criteria, we identified candidate matches between V9 OTUs and MAGs. In a subset of cases, V9 OTUs and MAGs could be successfully matched with one another with a one-to-one correspondence, implying that they likely represent the same underlying biological entity. More generally, matches we observed could be classified into 4 scenarios: Scenario I - one V9 OTU matches more than one MAG; Scenario II - more than one V9 OTU matches more than one MAG; Scenario III - more than one V9 OTU matches one MAG; Scenario IV - one V9 OTU matches one MAG. These diverse scenarios for V9 OTU-MAG matches illustrate the complex nature of the OTU/MAG relationship. Notably, we found some instances in which different OTU-MAG matches from the same taxonomic group were not classified in the same scenario, with all four scenarios possible even within the same taxonomic group, illustrating that factors beyond taxonomic lineage influence the relationship between OTUs and MAGs. Overall, each scenario produces a different interpretation of V9 OTUs, MAGs and how they compare in terms of the genomic and ecological diversity that they represent.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
mSystems
361 papers in training set
Top 0.9%
8.5%
2
Frontiers in Microbiology
375 papers in training set
Top 0.7%
8.5%
3
Microbiome
139 papers in training set
Top 0.3%
8.4%
4
PLOS ONE
4510 papers in training set
Top 27%
6.4%
5
Scientific Reports
3102 papers in training set
Top 27%
4.3%
6
Environmental Microbiome
26 papers in training set
Top 0.1%
4.0%
7
Molecular Ecology Resources
161 papers in training set
Top 0.3%
4.0%
8
PeerJ
261 papers in training set
Top 2%
4.0%
9
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
50% of probability mass above
10
Metabarcoding and Metagenomics
12 papers in training set
Top 0.1%
3.6%
11
Nature Communications
4913 papers in training set
Top 42%
3.3%
12
Environmental DNA
49 papers in training set
Top 0.1%
3.1%
13
BMC Bioinformatics
383 papers in training set
Top 3%
3.1%
14
mSphere
281 papers in training set
Top 2%
2.1%
15
Peer Community Journal
254 papers in training set
Top 1%
2.1%
16
Microorganisms
101 papers in training set
Top 0.8%
1.7%
17
Genome Biology
555 papers in training set
Top 4%
1.7%
18
Microbial Genomics
204 papers in training set
Top 1%
1.7%
19
Ecological Informatics
29 papers in training set
Top 0.5%
1.2%
20
Ecological Indicators
20 papers in training set
Top 0.4%
1.0%
21
Limnology and Oceanography: Methods
11 papers in training set
Top 0.3%
0.9%
22
Ecology and Evolution
232 papers in training set
Top 3%
0.9%
23
Microbiology Spectrum
435 papers in training set
Top 4%
0.9%
24
Environmental Microbiology
119 papers in training set
Top 3%
0.9%
25
eLife
5422 papers in training set
Top 55%
0.8%
26
iScience
1063 papers in training set
Top 32%
0.8%
27
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
28
Bioinformatics
1061 papers in training set
Top 10%
0.7%
29
BMC Genomics
328 papers in training set
Top 6%
0.7%
30
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.7%