Back

Powerful read processing with matchbox

Schuster, J.; Zeglinski, K.; Xiao, L. C.; Voulgaris, O.; Rivera, S. M.; Vervoort, S. J.; Ritchie, M. E.; Gouil, Q.; Clark, M. B.

2026-02-03 bioinformatics

10.1101/2025.11.09.685711 bioRxiv

Show abstract

The wide variety of protocols and applications for DNA and RNA sequencing makes flexible tools for read processing an important step in sequence analysis. Beyond trimming and demultiplexing, custom read-level processing is commonly required for data exploration, QC and analysis. Existing tools are often task-specific and dont generalise to new bioinformatic problems. Thus, there is a need for a tool flexible enough to handle the full variety of read processing tasks, and fast and scalable enough to retain high performance on growing sequencing datasets. We introduce matchbox, a read processor that enables fluent manipulation and analysis of FASTA/FASTQ/SAM/BAM files. With a lightweight scripting language designed around error-tolerant pattern-matching, users can write their own matchbox scripts to tackle a wide variety of bioinformatic problems, and incorporate them into existing pipelines and work-flows. We demonstrate matchboxs versatility in a number of contexts: demultiplexing long-read scRNA-seq data with 10X or SPLiT-seq barcodes; restranding RNA-seq reads; assessing CRISPR editing efficiency; and haplotyping macrosatel-lite repeat regions. matchbox achieves a computational performance comparable to existing tools, while addressing a broader range of bioinformatic needs, representing a new state-of-the-art in sequence processing. matchbox is implemented in Rust and available open-source at https://github.com/jakob-schuster/matchbox.

Powerful read processing with matchbox

Matching journals