Back

New Space-Time Tradeoffs for Subset Rank and k-mer Lookup

Diseth, A. C.; Puglisi, S. J.

2026-03-18 bioinformatics
10.64898/2026.03.16.712042 bioRxiv
Show abstract

Given a sequence S of subsets of symbols drawn from an alphabet of size{sigma} , a subset rank query srank(i, c) asks for the number of subsets before the ith subset that contain the symbol c. It was recently shown (Alanko et al., Proc. SIAM ACDA, 2023) that subset rank queries on the spectral Burrows-Wheeler lead to efficient k-mer lookup queries, an essential and widespread task in genomic sequence analysis. In this paper we design faster subset rank data structures that use small space--less than 3 bits per k-mer. Our experiments show that this translates to new Pareto optimal SBWT-based k-mer lookup structures at the low-memory end of the space-time spectrum.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.8%
2
Genome Research
409 papers in training set
Top 0.1%
17.7%
3
Cell Systems
167 papers in training set
Top 0.7%
12.7%
50% of probability mass above
4
Nature Biotechnology
147 papers in training set
Top 1%
6.9%
5
Algorithms for Molecular Biology
15 papers in training set
Top 0.1%
6.4%
6
Nature Communications
4913 papers in training set
Top 32%
4.9%
7
Nature Methods
336 papers in training set
Top 2%
4.0%
8
Genetics
225 papers in training set
Top 2%
2.1%
9
Nature Genetics
240 papers in training set
Top 4%
1.7%
10
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
11
PLOS ONE
4510 papers in training set
Top 58%
1.3%
12
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
13
BMC Bioinformatics
383 papers in training set
Top 5%
1.2%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
15
Genome Biology
555 papers in training set
Top 6%
0.9%
16
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
17
iScience
1063 papers in training set
Top 29%
0.8%
18
Scientific Reports
3102 papers in training set
Top 76%
0.7%
19
Communications Biology
886 papers in training set
Top 25%
0.7%
20
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.7%
21
Journal of Computational Biology
37 papers in training set
Top 0.7%
0.7%
22
Nature Computational Science
50 papers in training set
Top 2%
0.5%