Back

Genome assembly variation and its implications for gene discovery in nematode species

Mariene, G. M.; Wasmuth, J. D.

2024-02-29 genomics
10.1101/2024.02.26.582167 bioRxiv
Show abstract

Genome assemblers are a critical component of genome science, but the choice of assembly software and protocols can be daunting. Here, we investigate genome assembly variation and its implications for gene discovery across three nematode species--Caenorhabditis bovis, Haemonchus contortus, and Heligmosomoides bakeri--highlighting the critical interplay between assembly choice and downstream genomic analysis. Selecting popular genome assemblers, we generated multiple assemblies for each species, analyzing their structure, completeness, and effect on gene family analysis. Our findings demonstrate that assembly variations can significantly affect gene family composition, with notable differences in critical gene families like cyp, gst, ugt, and nhr. Despite broadly similar performance using various assembly metrics, comparisons of assemblies with a single species revealed underlying structural rearrangements and inconsistencies in gene content. This emphasizes the imperative for continuous refinement of genomic resources. Our findings advocate for a cautious and informed approach to genome assembly and annotation to ensure reliable and insightful genomic interpretations.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Genomics
328 papers in training set
Top 0.1%
27.7%
2
Microbial Genomics
204 papers in training set
Top 0.2%
10.1%
3
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.2%
8.4%
4
GigaScience
172 papers in training set
Top 0.2%
6.4%
50% of probability mass above
5
Gigabyte
60 papers in training set
Top 0.2%
4.3%
6
Frontiers in Genetics
197 papers in training set
Top 2%
3.6%
7
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
8
BMC Biology
248 papers in training set
Top 0.6%
2.4%
9
Scientific Reports
3102 papers in training set
Top 50%
2.1%
10
Molecular Ecology Resources
161 papers in training set
Top 0.5%
1.9%
11
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
13
PLOS ONE
4510 papers in training set
Top 54%
1.7%
14
Genomics
60 papers in training set
Top 1%
1.5%
15
Genome Biology
555 papers in training set
Top 5%
1.5%
16
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.5%
1.3%
17
PeerJ
261 papers in training set
Top 9%
1.3%
18
Genome Biology and Evolution
280 papers in training set
Top 1%
1.3%
19
Journal of Heredity
35 papers in training set
Top 0.2%
0.9%
20
Genetics
225 papers in training set
Top 4%
0.8%
21
F1000Research
79 papers in training set
Top 4%
0.8%
22
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
23
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 5%
0.8%
24
The Plant Genome
53 papers in training set
Top 0.6%
0.7%
25
G3
33 papers in training set
Top 0.6%
0.6%
26
Cell Genomics
162 papers in training set
Top 8%
0.6%
27
Communications Biology
886 papers in training set
Top 29%
0.6%