LRMD: Reference-Free Misassembly Detection Based on Multiple Features from Long-Read Alignments
Wang, J.; Nie, F.; Shi, X.
Show abstract
Genome assembly serves as the cornerstone of genomics research, with the detection of misassembly playing a crucial role in downstream analyses. Reference-free methods for misassembly detection, leveraging read alignments, enable us to circumvent the need for high-quality reference genomes and broaden their applicability. However, existing methods struggle to effectively utilize alignment data, leading to a noticeable deficiency in sensitivity for detecting misassemblies. We introduce LRMD, a novel reference-free tool for misassembly detection. LRMD integrates depth, clipping, and read pileup information derived from long-read-to-assembly alignments to significantly enhance sensitivity in identifying misassemblies. Experimental evaluations on both simulated and real datasets demonstrate that LRMD consistently outperforms existing tools in terms of sensitivity and F1-score. Notably, its results are closest to the reference-based evaluation tool QUAST. As an evaluation tool, LRMD also outputs metrics such as base quality, assembly size, contig N50, and others. LRMD is public available at http://github.com/sxfss/LRMD.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.