Back

Metaxa: A Transformer-Based Deep Learning Model for Taxonomic Classification of Long Nanopore Reads

Friganovic, K.; Stanojevic, D.; Chen, P.-S. B.; Sikic, M.

2026-04-23 bioinformatics
10.64898/2026.04.20.719780 bioRxiv
Show abstract

A significant fraction of the microbial diversity remains unclassified, hindering our understanding of microbial roles in health and ecosystems. State-of-the-art methods like Kraken 2 perform well for taxa that are present in the database. However, their accuracy drops significantly when classifying taxa that are not included. While deep learning has advanced many fields, its applications in metagenomics remain limited, and its full potential has yet to be realized. Here, we present Metaxa, a transformer-based deep learning model designed for the taxonomic classification of long-read Nanopore sequences. Metaxa leverages the sequential context of Nanopore reads, enabling robust classification beyond fixed k-mer profiles. Our results show that Metaxa matches Kraken 2 on in-sample data at both the species and genus levels, and significantly outperforms both Kraken 2 and MetageNN at the genus level on out-of-sample datasets where the species genome is absent from the reference database but a different species from the same genus is present. Furthermore, Metaxa demonstrates strong generalization across different Nanopore chemistries (R9.4.1 and R10.4.1). This work highlights the potential of deep learning models to improve metagenomic classification accuracy, especially in complex or underexplored environments where traditional tools fall short.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Microbiome
139 papers in training set
Top 0.1%
17.2%
2
Bioinformatics
1061 papers in training set
Top 4%
6.3%
3
Genome Biology
555 papers in training set
Top 1%
6.3%
4
Briefings in Bioinformatics
326 papers in training set
Top 0.9%
6.3%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.9%
6
Nature Communications
4913 papers in training set
Top 38%
3.8%
7
BMC Bioinformatics
383 papers in training set
Top 3%
3.5%
8
mSystems
361 papers in training set
Top 3%
3.5%
50% of probability mass above
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.9%
3.0%
10
PLOS ONE
4510 papers in training set
Top 44%
2.7%
11
Advanced Science
249 papers in training set
Top 8%
2.6%
12
Bioinformatics Advances
184 papers in training set
Top 2%
2.6%
13
Scientific Reports
3102 papers in training set
Top 46%
2.6%
14
GigaScience
172 papers in training set
Top 0.9%
2.3%
15
Nucleic Acids Research
1128 papers in training set
Top 9%
2.0%
16
Cell Reports Methods
141 papers in training set
Top 2%
1.9%
17
Microbial Genomics
204 papers in training set
Top 1%
1.8%
18
Frontiers in Microbiology
375 papers in training set
Top 5%
1.7%
19
Nature Biotechnology
147 papers in training set
Top 5%
1.7%
20
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
21
BMC Genomics
328 papers in training set
Top 3%
1.5%
22
Genome Medicine
154 papers in training set
Top 6%
1.2%
23
iScience
1063 papers in training set
Top 22%
1.2%
24
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
25
mSphere
281 papers in training set
Top 5%
0.9%
26
Genome Research
409 papers in training set
Top 4%
0.9%
27
Microbiology Spectrum
435 papers in training set
Top 6%
0.7%
28
Water Research
74 papers in training set
Top 1%
0.7%
29
Cell Systems
167 papers in training set
Top 13%
0.7%
30
Patterns
70 papers in training set
Top 3%
0.7%