SPrOUT: A computational and targeted sequencing approach for mixed plant DNA identification with Angiosperms353
Hu, N.; Bullock, M. R.; Jackson, C.; Miller, C.; Hunter, E.; Huff, C.; Chen, Y.; Handy, S.; Johnson, M.
Show abstract
PremiseThe identification of plant species from mixed samples is crucial in various fields, including ecological surveys, conservation efforts, and food and dietary supplement safety. Traditional methods face potential challenges due to the high costs of DNA sequencing, inefficiencies in computational workflows, and incomplete sequence databases. Methods and ResultsThis study introduces a novel approach using the Angiosperms353 target sequencing kit for efficient taxonomic identification of angiosperm DNA in mixed samples. Our method assembles short pair-end reads for each mixed sample. Using gene sets of Angiosperms353 from 871 species, we apply phylogenetic inference to categorize the variance in phylogenetic distance across genes to identify the presence of taxa in mixed plant samples. The pipeline reaches 98.1 to 99.6% accuracy, 92.9 to 100% precision for identifying unknown taxa in in-silico mixes, and 90.7% accuracy and 98.0% precision for mock supplement mixtures. We explored the parameter cutoffs of the pipeline to offer an empirical range for different applications. ConclusionsThe Angiosperms353 and HybPiper assembly proved effective in sorting mixed plant DNA samples. Our method offers a framework for scientific and practical applications in plant species identification in both single and mixed samples.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.