Back

Statistical analysis of number of genes and chromosome lengths of different microbial species

Ng, W.

2022-08-16 genomics
10.1101/2022.08.13.503871 bioRxiv
Show abstract

Genome architecture concerns the organisation of genes on a chromosome, and has important implications to the fidelity in which genes are encoded on the chromosome, and how the information is read by DNA polymerase and RNA polymerase. This facet of genomics did receive attention in the early epoch of genomics, but it has received less attention in contemporary genomics as attention shifts to structural and functional genomics with the goal of annotating the function of each gene in the genome. This work sought to uncover relationships between number of genes and chromosome length in a variety of bacteria and archaea species as a preamble to understanding the prevalence and importance of repetitive sequences in the genome of prokaryotic species. Aggregate results with the ensemble of prokaryotic species profiled revealed a positive linear correlation between number of genes and chromosome length. Upon dissection into the Bacteria and Archaea domains, the linear relationship described above still stands for Bacteria but starts to break down in Archaea. This suggests that repetitive sequences are more important to Archaea species, which generally have a smaller genome (1.8 to 2.8 Mbp) and fewer genes (1500 to 2400) compared to bacterial species. In comparison, the bacterial genome is larger (4 to 5.6 Mbp), and encodes more genes (3600 to 5100). Overall, the results highlight that bacterial genome are efficiently encoded with few repetitive sequences. This, however, is not true for archaeal genome, which provides another line of evidence supporting the notion that archaea are ancestral eukaryotic cells, which like the archaea also houses large repetitive sequences. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=141 SRC="FIGDIR/small/503871v1_ufig1.gif" ALT="Figure 1"> View larger version (15K): org.highwire.dtl.DTLVardef@1632506org.highwire.dtl.DTLVardef@13e91forg.highwire.dtl.DTLVardef@12e1316org.highwire.dtl.DTLVardef@1e7381e_HPS_FORMAT_FIGEXP M_FIG C_FIG Short descriptionStatistical analysis across an ensemble of 59 microbial species revealed a strong linear correlation between number of genes and chromosome length. This suggests that prokaryotic genomes are highly compact with genes, and do not carry significant amounts of repeats unlike the case in eukaryotic organisms. The result holds significant implications for our understanding of genome evolution and compaction in prokaryotic organisms, and what drove their accession as foundational species of many ecosystems. Subject areasgenomics, molecular biology, evolutionary biology, bioinformatics, systems biology,

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
9.0%
2
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.6%
6.3%
3
Genome Biology and Evolution
280 papers in training set
Top 0.2%
6.3%
4
Frontiers in Microbiology
375 papers in training set
Top 1%
6.3%
5
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
6
GigaScience
172 papers in training set
Top 0.3%
4.2%
7
Frontiers in Genetics
197 papers in training set
Top 2%
3.5%
8
F1000Research
79 papers in training set
Top 0.6%
3.5%
9
PeerJ
261 papers in training set
Top 3%
3.0%
10
Bioinformatics
1061 papers in training set
Top 6%
2.7%
50% of probability mass above
11
Microbial Genomics
204 papers in training set
Top 0.8%
2.7%
12
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.9%
2.6%
13
BMC Genomics
328 papers in training set
Top 1%
2.3%
14
mSystems
361 papers in training set
Top 4%
1.9%
15
Genomics
60 papers in training set
Top 0.8%
1.9%
16
Scientific Reports
3102 papers in training set
Top 56%
1.8%
17
PLOS ONE
4510 papers in training set
Top 52%
1.8%
18
BMC Biology
248 papers in training set
Top 1%
1.7%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
20
Genes
126 papers in training set
Top 1%
1.5%
21
Mobile DNA
27 papers in training set
Top 0.1%
1.5%
22
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
23
Genome Biology
555 papers in training set
Top 5%
1.5%
24
Plant Direct
81 papers in training set
Top 1%
1.3%
25
Life Science Alliance
263 papers in training set
Top 0.7%
1.2%
26
Frontiers in Plant Science
240 papers in training set
Top 4%
1.2%
27
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
28
Microorganisms
101 papers in training set
Top 2%
0.7%
29
The Plant Journal
197 papers in training set
Top 3%
0.7%
30
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.7%