Back

LRMD: Reference-Free Misassembly Detection Based on Multiple Features from Long-Read Alignments

Wang, J.; Nie, F.; Shi, X.

2025-11-08 genomics
10.1101/2025.11.07.686952 bioRxiv
Show abstract

Genome assembly serves as the cornerstone of genomics research, with the detection of misassembly playing a crucial role in downstream analyses. Reference-free methods for misassembly detection, leveraging read alignments, enable us to circumvent the need for high-quality reference genomes and broaden their applicability. However, existing methods struggle to effectively utilize alignment data, leading to a noticeable deficiency in sensitivity for detecting misassemblies. We introduce LRMD, a novel reference-free tool for misassembly detection. LRMD integrates depth, clipping, and read pileup information derived from long-read-to-assembly alignments to significantly enhance sensitivity in identifying misassemblies. Experimental evaluations on both simulated and real datasets demonstrate that LRMD consistently outperforms existing tools in terms of sensitivity and F1-score. Notably, its results are closest to the reference-based evaluation tool QUAST. As an evaluation tool, LRMD also outputs metrics such as base quality, assembly size, contig N50, and others. LRMD is public available at http://github.com/sxfss/LRMD.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Genome Research
409 papers in training set
Top 0.1%
22.0%
2
Genome Biology
555 papers in training set
Top 0.1%
17.1%
3
Bioinformatics
1061 papers in training set
Top 3%
8.2%
4
Nature Biotechnology
147 papers in training set
Top 2%
4.7%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 34%
4.7%
6
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
7
Nature Methods
336 papers in training set
Top 3%
3.5%
8
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.0%
9
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
10
Nature Genetics
240 papers in training set
Top 4%
2.0%
11
Nucleic Acids Research
1128 papers in training set
Top 10%
1.8%
12
BMC Genomics
328 papers in training set
Top 2%
1.8%
13
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
15
Nature Computational Science
50 papers in training set
Top 0.7%
1.7%
16
Cell Systems
167 papers in training set
Top 8%
1.6%
17
PLOS ONE
4510 papers in training set
Top 61%
1.2%
18
GigaScience
172 papers in training set
Top 2%
0.9%
19
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.5%
0.9%
20
Cell Genomics
162 papers in training set
Top 6%
0.9%
21
Genome Medicine
154 papers in training set
Top 8%
0.8%
22
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
23
Scientific Reports
3102 papers in training set
Top 77%
0.7%
24
Science
429 papers in training set
Top 22%
0.6%