Back

SPrOUT: A computational and targeted sequencing approach for mixed plant DNA identification with Angiosperms353

Hu, N.; Bullock, M. R.; Jackson, C.; Miller, C.; Hunter, E.; Huff, C.; Chen, Y.; Handy, S.; Johnson, M.

2026-02-23 bioinformatics
10.64898/2026.02.20.707031 bioRxiv
Show abstract

PremiseThe identification of plant species from mixed samples is crucial in various fields, including ecological surveys, conservation efforts, and food and dietary supplement safety. Traditional methods face potential challenges due to the high costs of DNA sequencing, inefficiencies in computational workflows, and incomplete sequence databases. Methods and ResultsThis study introduces a novel approach using the Angiosperms353 target sequencing kit for efficient taxonomic identification of angiosperm DNA in mixed samples. Our method assembles short pair-end reads for each mixed sample. Using gene sets of Angiosperms353 from 871 species, we apply phylogenetic inference to categorize the variance in phylogenetic distance across genes to identify the presence of taxa in mixed plant samples. The pipeline reaches 98.1 to 99.6% accuracy, 92.9 to 100% precision for identifying unknown taxa in in-silico mixes, and 90.7% accuracy and 98.0% precision for mock supplement mixtures. We explored the parameter cutoffs of the pipeline to offer an empirical range for different applications. ConclusionsThe Angiosperms353 and HybPiper assembly proved effective in sorting mixed plant DNA samples. Our method offers a framework for scientific and practical applications in plant species identification in both single and mixed samples.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 10%
18.2%
2
BMC Bioinformatics
383 papers in training set
Top 0.7%
12.1%
3
Molecular Ecology Resources
161 papers in training set
Top 0.1%
9.9%
4
Briefings in Bioinformatics
326 papers in training set
Top 0.6%
8.2%
5
Scientific Reports
3102 papers in training set
Top 25%
4.8%
50% of probability mass above
6
Methods in Ecology and Evolution
160 papers in training set
Top 0.9%
3.5%
7
PeerJ
261 papers in training set
Top 4%
2.7%
8
BMC Genomics
328 papers in training set
Top 2%
2.1%
9
GigaScience
172 papers in training set
Top 1.0%
2.1%
10
Frontiers in Plant Science
240 papers in training set
Top 3%
2.0%
11
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.0%
12
Horticulture Research
43 papers in training set
Top 0.8%
2.0%
13
Gigabyte
60 papers in training set
Top 0.6%
1.8%
14
Environmental DNA
49 papers in training set
Top 0.2%
1.7%
15
Genome Biology
555 papers in training set
Top 4%
1.7%
16
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
17
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
18
Plant Methods
39 papers in training set
Top 0.5%
1.2%
19
Genes
126 papers in training set
Top 2%
1.2%
20
Bioinformatics
1061 papers in training set
Top 8%
1.1%
21
BioTechniques
24 papers in training set
Top 0.2%
0.9%
22
Database
51 papers in training set
Top 0.9%
0.8%
23
Journal of Computational Biology
37 papers in training set
Top 0.5%
0.8%
24
Heliyon
146 papers in training set
Top 7%
0.7%
25
Scientific Data
174 papers in training set
Top 3%
0.7%
26
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.7%
27
Plant Communications
35 papers in training set
Top 2%
0.7%
28
Analytical Biochemistry
26 papers in training set
Top 0.3%
0.6%
29
PLOS Computational Biology
1633 papers in training set
Top 28%
0.6%
30
PLOS Neglected Tropical Diseases
378 papers in training set
Top 6%
0.6%