Back

The linear correlation between genome size and the size of the non-transcribing region

Chen, Z.-R.

2024-09-22 genomics
10.1101/2024.09.19.613789 bioRxiv
Show abstract

BackgroundThe genome sizes of organisms vary widely (C-value paradox). There are non-transcribing regions in the genome that neither encode proteins nor RNA entities. There are several hypotheses about the function of these regions: one suggests that they are unannotated functional areas, while another views them as genomic isolation zones that reduce mutations in coding regions. MethodStatistical analysis was conducted on the transcribing regions (including areas annotated as genes and transcribed pseudogenes) and non-transcribing regions, protein-coding regions (Coding sequence, CDS), and genome sizes using annotation files from 63,866 species genomes in the NCBI RefSeq database. ResultsThere is a significant linear relationship between the size of non-transcribing genomic regions and overall genome size across species, with varying proportional coefficients among different phyla (realms for viruses). As genome size increases, the proportion of non-transcribing regions gradually rises, eventually approaching a linear proportional limit, resembling one arm of hyperbolic functions. Eukaryotes show high linear correlation, with the highest in Streptophyta and the lowest in Apicomplexa. In eukaryotes, the size of the coding region increases with genome size, but the increasing trend diminishes (proportionally decreases). In non-eukaryotes, the size of the coding region maintains a linear relationship with genome size. ConclusionThe size of non-transcribing region in species may be subject to some strict quantitative control mechanism, showing that genome and non-transcribing genome sizes increase proportionally with the expansion of the transcribing genome, indicating a strict balance between expansion and energy conservation. The proportion of non-transcribed genomes in eukaryotes is conservative (although the sequences are not), and the presence of non-transcribing genomes has significant implications for the evolution or survival of species. Thus, I propose a new hypothesis about the non-transcribing genome, that it is a space for generating new genes from scratch, and the different proportional coefficients among phyla are due to their different positions in energy transfer. Graphic Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=104 SRC="FIGDIR/small/613789v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@dc3e88org.highwire.dtl.DTLVardef@18d70e8org.highwire.dtl.DTLVardef@efb92corg.highwire.dtl.DTLVardef@66068b_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
12.6%
2
PLOS ONE
4510 papers in training set
Top 17%
10.5%
3
PeerJ
261 papers in training set
Top 0.2%
9.2%
4
F1000Research
79 papers in training set
Top 0.1%
6.9%
5
Frontiers in Genetics
197 papers in training set
Top 1%
4.9%
6
Genes
126 papers in training set
Top 0.1%
4.9%
7
Molecular Genetics and Genomics
11 papers in training set
Top 0.1%
4.0%
50% of probability mass above
8
Gene
41 papers in training set
Top 0.3%
3.6%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
10
Genome Biology and Evolution
280 papers in training set
Top 0.6%
2.6%
11
Scientific Reports
3102 papers in training set
Top 49%
2.1%
12
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
13
BMC Genomics
328 papers in training set
Top 2%
2.1%
14
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
1.7%
15
PLOS Computational Biology
1633 papers in training set
Top 19%
1.2%
16
Genomics
60 papers in training set
Top 2%
1.2%
17
GigaScience
172 papers in training set
Top 2%
1.2%
18
Mitochondrion
11 papers in training set
Top 0.1%
1.2%
19
BMC Biology
248 papers in training set
Top 2%
1.2%
20
Biology
43 papers in training set
Top 1%
1.2%
21
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.2%
22
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 6%
1.1%
23
Heliyon
146 papers in training set
Top 6%
0.8%
24
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
25
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
26
Gene Reports
13 papers in training set
Top 0.7%
0.8%
27
Ecology and Evolution
232 papers in training set
Top 4%
0.8%
28
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.8%
0.7%
29
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.7%
30
Journal of Structural Biology
58 papers in training set
Top 2%
0.7%