New Space-Time Tradeoffs for Subset Rank and k-mer Lookup
Diseth, A. C.; Puglisi, S. J.
Show abstract
Given a sequence S of subsets of symbols drawn from an alphabet of size{sigma} , a subset rank query srank(i, c) asks for the number of subsets before the ith subset that contain the symbol c. It was recently shown (Alanko et al., Proc. SIAM ACDA, 2023) that subset rank queries on the spectral Burrows-Wheeler lead to efficient k-mer lookup queries, an essential and widespread task in genomic sequence analysis. In this paper we design faster subset rank data structures that use small space--less than 3 bits per k-mer. Our experiments show that this translates to new Pareto optimal SBWT-based k-mer lookup structures at the low-memory end of the space-time spectrum.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.