Back

FastDedup - A fast and memory-efficient tool for read deduplication

Ribes, R.; Mandier, C.; Baniel, A.

2026-05-04 bioinformatics
10.64898/2026.04.29.721745 bioRxiv
Show abstract

PCR duplicate removal is a critical first step in high-throughput sequencing pipelines, yet existing tools struggle with speed, memory, or correctness at modern dataset scales. We present FastDedup, a Rust-based FASTX deduplicator that transforms each read or read pair to a compact xxh3 hash fingerprint, drastically reducing memory usage and binding most of the execution time to disk I/ O. Benchmarked against six competing tools on synthetic human WGS datasets up to 300 million reads, FastDedup consistently leads on paired-end data, running more than 10 times faster than fastp. It also outperforms all tools on uncompressed single-end data, deduplicating a million reads in a second. We additionally report correctness failures in prinseq++ and clumpify. FastDedup is available under the MIT License via GitHub, Bioconda, and Cargo.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.