Back

Genomic indicators of gene function: A systematic assessment of the human genome

Cooper, H. B.; Rojas Lopez, K. E.; Schiavinato, D.; Black, M. A.; Gardner, P. P.

2026-04-09 genomics
10.64898/2026.04.08.717348 bioRxiv
Show abstract

Proteins and non-coding RNAs are functional products of the genome that are central for crucial cellular processes. With recent technological advances, researchers can sequence genomes in the thousands and probe numerous genomic activities of many species and conditions. Such studies have identified thousands of potential proteins, RNAs and associated activities. However there are conflicting interpretations of the results and therefore which regions of the genome are "functional". Here we investigate the relative strengths of associations between coding and non-coding gene functionality and genomic features, by comparing reliably annotated functional genes to non-genic regions of the genome. We find that the strongest and most consistent association between functional genes and genomic features are transcriptional activity and evolutionary conservation. We also evaluated sequence-based statistics, genomic repeats, epigenetic and population variation data. Other features strongly associated with function include histone marks, chromatin accessibility, genomic copy-number, and sequence alignment statistics such as coding potential and covariation. We also identify potential issues with SNP annotations in short non-coding RNAs, as some highly conserved ncRNAs have significantly higher than expected SNP densities. Our results demonstrate the importance of evolutionary conservation and transcription activity for indicating protein-coding and non-coding gene function. Both should be taken into consideration when differentiating between functional sequences and biological or experimental noise.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.1%
18.2%
2
Scientific Reports
3102 papers in training set
Top 20%
6.2%
3
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.4%
6.2%
4
BMC Genomics
328 papers in training set
Top 0.3%
6.2%
5
Genes
126 papers in training set
Top 0.1%
4.2%
6
PLOS ONE
4510 papers in training set
Top 35%
4.2%
7
Genome Biology and Evolution
280 papers in training set
Top 0.4%
3.9%
8
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
3.8%
50% of probability mass above
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.5%
10
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.8%
3.5%
11
Bioinformatics
1061 papers in training set
Top 6%
3.5%
12
Genetic Epidemiology
46 papers in training set
Top 0.3%
2.5%
13
PeerJ
261 papers in training set
Top 4%
2.3%
14
Genetics
225 papers in training set
Top 2%
2.0%
15
BMC Bioinformatics
383 papers in training set
Top 4%
2.0%
16
F1000Research
79 papers in training set
Top 1%
2.0%
17
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
18
Genome Research
409 papers in training set
Top 2%
1.7%
19
Genomics
60 papers in training set
Top 1%
1.7%
20
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.6%
21
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.4%
22
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
23
PLOS Genetics
756 papers in training set
Top 13%
0.9%
24
Genome Biology
555 papers in training set
Top 7%
0.8%
25
Bioinformatics Advances
184 papers in training set
Top 5%
0.8%
26
Molecular Genetics and Genomics
11 papers in training set
Top 0.5%
0.7%
27
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
28
Nature Communications
4913 papers in training set
Top 66%
0.6%