Back

Machine Learning-Enhanced Nanopore ITS Analysis: Evaluating CPU-GPU Pipelines for High-Accuracy Fungal Taxonomic Resolution

Albuja, D. S.; Maldonado, P. S.; Zambrano, P. E.; Olmos, J. R.; Vera, E. R.

2026-04-07 bioinformatics
10.64898/2026.04.06.716835 bioRxiv
Show abstract

Accurate fungal species identification is critical for microbial ecology, food safety, and plant pathology. However, morphological limitations and genomic complexity hinder this process. Molecular markers such as the ITS region, along with Oxford Nanopore long-read sequencing, offer a robust solution, albeit limited by error rates in homopolymeric regions and a high dependence on advanced computational resources (GPUs) to achieve high accuracy. This study benchmarks two bioinformatics workflows on a multiplexed dataset of complex fungal communities to address this technological gap: a CPU-based workflow optimized using a Bayesian machine learning engine and a GPU-accelerated workflow incorporating "super high accuracy" (SUP) models and refinement with neural networks. The results establish a scalable framework for evaluating the impact of computational architecture on final taxonomic resolution. It is demonstrated that GPU processing maximizes data retention and species-level accuracy by correcting systematic errors. Alternately, implementing automated hyperparameter optimization in CPU environments stabilizes sequence clustering and achieves high taxonomic concordance at the genus level. This conceptual advance validates the feasibility of performing ITS metabarcoding analysis in resource-constrained infrastructures, thus providing the scientific community with a reproducible protocol that balances the need for taxonomic precision with hardware availability.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.4%
17.0%
2
GigaScience
172 papers in training set
Top 0.1%
8.0%
3
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.3%
6.2%
4
Microbial Genomics
204 papers in training set
Top 0.5%
4.2%
5
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
6
Microbiome
139 papers in training set
Top 1%
3.5%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.2%
9
Frontiers in Microbiology
375 papers in training set
Top 3%
3.0%
50% of probability mass above
10
Molecular Ecology Resources
161 papers in training set
Top 0.4%
2.8%
11
Scientific Reports
3102 papers in training set
Top 45%
2.7%
12
Nucleic Acids Research
1128 papers in training set
Top 7%
2.7%
13
Bioinformatics
1061 papers in training set
Top 6%
2.3%
14
Genome Biology
555 papers in training set
Top 4%
2.0%
15
Nature Communications
4913 papers in training set
Top 49%
1.8%
16
Advanced Science
249 papers in training set
Top 10%
1.8%
17
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
18
mSphere
281 papers in training set
Top 3%
1.7%
19
Cell Reports Methods
141 papers in training set
Top 3%
1.6%
20
PLOS ONE
4510 papers in training set
Top 55%
1.6%
21
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.6%
22
mSystems
361 papers in training set
Top 5%
1.6%
23
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.2%
24
BMC Genomics
328 papers in training set
Top 4%
1.1%
25
Communications Biology
886 papers in training set
Top 18%
0.9%
26
Nature Biotechnology
147 papers in training set
Top 7%
0.9%
27
PeerJ
261 papers in training set
Top 13%
0.9%
28
Frontiers in Plant Science
240 papers in training set
Top 5%
0.8%
29
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.7%
30
Patterns
70 papers in training set
Top 3%
0.7%