Back

MycorrhizaTracer: A BIOINFORMATIC PIPELINE FOR FUNGI AND PLANT CLASSIFICATION OF SANGER DNA SEQUENCES

Brekke, T. D.; Weeks, T.; Barber, R. A.; Thomson, I.; Gooda, R.; Gargiulo, R.; Delhaye, G.; Andrew, C.; Kowal, J.; Bidartondo, M.; Martinez-Suz, L.

2026-04-27 bioinformatics
10.64898/2026.04.23.720352 bioRxiv
Show abstract

Processing Sanger DNA sequences remains a routine yet technically demanding step in many biodiversity and ecological studies, particularly when barcoding large numbers of environmental samples. Manual inspection and editing of trace files, DNA sequence alignment, and classification using taxonomic reference databases is time-consuming, inconsistent, and prone to error. These challenges are compounded in studies involving degraded samples, in-house DNA sequencing, under-described taxa, or when investigators have limited access to computational tools. We present MycorrhizaTracer, an open-source, fully automated pipeline for processing and taxonomically classifying large batches of Sanger sequencing chromatograms. We have optimized it for fungal and plant taxa, but it is adaptable across the tree of life. The pipeline performs quality trimming, consensus generation from bidirectional reads, taxonomic classification via BLAST, clustering, optional salvaging of low-quality sequences, and functional annotation of fungal taxa. Designed for scalability and ease of use, MycorrhizaTracer can process thousands of DNA chromatograms in a matter of hours without the need for an HPC. Accuracy and ecological relevance are ensured by features such as gene region-specific taxonomic filtering and sequence-based clustering of unclassified reads. By streamlining trace-to-taxon workflows, MycorrhizaTracer reduces the burden of manual curation, supports reproducibility, and enables efficient recovery of biodiversity data from Sanger sequences - particularly in field-based or resource-limited research contexts.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
14.1%
2
Nature Communications
4913 papers in training set
Top 19%
9.9%
3
BMC Bioinformatics
383 papers in training set
Top 1%
6.7%
4
PLOS ONE
4510 papers in training set
Top 29%
6.2%
5
Bioinformatics
1061 papers in training set
Top 4%
6.2%
6
Nature Biotechnology
147 papers in training set
Top 2%
4.8%
7
Nucleic Acids Research
1128 papers in training set
Top 4%
4.8%
50% of probability mass above
8
Molecular Ecology Resources
161 papers in training set
Top 0.3%
4.3%
9
Methods in Ecology and Evolution
160 papers in training set
Top 0.8%
3.6%
10
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.8%
3.5%
11
Microbiome
139 papers in training set
Top 1%
2.7%
12
Nature Methods
336 papers in training set
Top 3%
2.6%
13
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.0%
14
Scientific Reports
3102 papers in training set
Top 54%
1.9%
15
Microbial Genomics
204 papers in training set
Top 1%
1.7%
16
Genome Medicine
154 papers in training set
Top 4%
1.7%
17
Cell Reports Methods
141 papers in training set
Top 2%
1.7%
18
GigaScience
172 papers in training set
Top 1%
1.7%
19
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
20
Nature Protocols
30 papers in training set
Top 0.1%
1.7%
21
Genome Research
409 papers in training set
Top 3%
1.2%
22
BMC Genomics
328 papers in training set
Top 4%
1.2%
23
Communications Biology
886 papers in training set
Top 15%
1.2%
24
Scientific Data
174 papers in training set
Top 2%
0.9%
25
Frontiers in Microbiology
375 papers in training set
Top 9%
0.7%
26
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
27
mSphere
281 papers in training set
Top 7%
0.6%
28
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.6%