The effect of removing repeat-induced overlaps in de novo assembly
Shiarli Hossein Zade, R.; Abeel, T.
Show abstract
Determining accurate genotypes is important for associating phenotypes to genotypes. De novo genome assembly is a critical step to determine the complete genotype for species for which no reference exists yet. The main challenge of de novo eukaryote genome assembly, particularly plant genomes, are repetitive DNA sequences within their genomes. The introduction of third generation sequencing and corresponding long reads has promised to resolve repeat-related problems. While there have been notable improvements, reads originating from these repeats are still creating errors because they introduce false overlaps in the assembly graph. This study focuses on analyzing the effect of repeats on de novo assembly and improving performance of existing de novo assembly algorithms by removing repeat-induced overlaps. First, we show the possible improvements in de novo assembly with removing repeat-induced overlaps. Then we propose several methods for detecting and removing repeat-induced overlaps and evaluate their performance on several simulated datasets.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.