Back

NanoLabel: A fast and accurate real-time nanopore signal classifier

Mahajan, D.; Jain, C.; Kashyap, N.

2026-05-06 genomics
10.64898/2026.05.03.722500 bioRxiv
Show abstract

Oxford Nanopore Technologies adaptive sampling capability promises to reduce sequencing cost and turnaround time. At its core, adaptive sampling is a real-time classification problem that distinguishes reads originating from regions of interest. Direct signal-based classification approaches bypass the computational bottleneck of basecalling and can eliminate the need for powerful GPUs. However, operating directly on noisy raw signals remains challenging in real-time settings, where classification decisions must be made quickly. In this work, we propose NanoLabel, a new method for real-time classification of nanopore signals. We build NanoLabel on top of signal-based read mapping tool, RawHash2. We accelerate the classification workflow by mapping reads using only the target regions as the reference. To further improve accuracy, we train a lightweight classifier on mapping-derived features and introduce a data augmentation strategy to construct sufficiently large and class-balanced training datasets. We evaluate NanoLabel using publicly available real sequencing datasets from three human genomes (HG001, HG002, and HG005), while assuming a cancer gene panel as the target. Compared to directly mapping reads with RawHash2, we demonstrate 80 x improvement in the classification time and 0.10 - 0.25 units improvement in the F1 score.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Biotechnology
147 papers in training set
Top 0.1%
22.8%
2
Genome Research
409 papers in training set
Top 0.1%
14.5%
3
Genome Biology
555 papers in training set
Top 0.2%
12.8%
50% of probability mass above
4
Nature Methods
336 papers in training set
Top 0.9%
10.6%
5
Bioinformatics
1061 papers in training set
Top 3%
9.3%
6
Nature Communications
4913 papers in training set
Top 28%
6.4%
7
Nature Computational Science
50 papers in training set
Top 0.2%
3.1%
8
Nature
575 papers in training set
Top 10%
1.8%
9
Scientific Reports
3102 papers in training set
Top 55%
1.8%
10
Science
429 papers in training set
Top 17%
1.2%
11
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
12
Genome Medicine
154 papers in training set
Top 7%
0.8%
13
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
14
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
15
Nature Genetics
240 papers in training set
Top 7%
0.8%
16
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
17
PLOS ONE
4510 papers in training set
Top 67%
0.8%
18
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.8%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
20
Communications Biology
886 papers in training set
Top 28%
0.7%
21
BMC Genomics
328 papers in training set
Top 7%
0.7%
22
Cell
370 papers in training set
Top 19%
0.7%
23
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%