Back

Correlation Between Information Entropy and Functions of Gene Sequences in the Evolutionary Context: A New Way to Construct Gene Regulatory Networks from Sequence

Pan, L.; Chen, M.; Tanik, M.

2026-04-07 bioinformatics
10.64898/2026.04.03.714856 bioRxiv
Show abstract

The information encoded in DNA sequences can be rigorously quantified using Shannon entropy and related measures. When placed in an evolutionary context, this quantification offers a principled yet underexplored route to constructing gene regulatory networks (GRNs) directly from sequence data. While most GRN inference methods rely exclusively on gene expression profiles, the regulatory code is ultimately written in the DNA sequence itself. Here we review the mathematical foundations of information theory as applied to gene sequences, survey existing computational methods for GRN inference--with emphasis on information-theoretic and sequence-based approaches--and examine how evolutionary conservation constrains sequence entropy to preserve biological function. We then propose a four-layer integrative framework that combines per-position Shannon entropy profiles, evolutionary conservation scoring via Jensen- Shannon divergence, expression-based mutual information and transfer entropy, and DNA foundation model embeddings to construct GRNs from sequence. Through worked examples on the Escherichia coli SOS regulatory sub-network, we demonstrate how conservation-weighted mutual information improves edge discrimination and how transfer entropy resolves regulatory directionality. The framework generates testable predictions: edges supported by low-entropy regulatory regions should show higher experimental validation rates, and cross-species entropy profile conservation should predict GRN topology conservation. This work bridges three scales of biological information--nucleotide-level entropy, evolutionary constraint patterns, and network-level regulatory logic--establishing information entropy as the natural mathematical language for sequence-to-network regulatory inference.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.8%
22.3%
2
Bioinformatics
1061 papers in training set
Top 3%
8.3%
3
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.2%
6.3%
4
Cell Systems
167 papers in training set
Top 2%
6.3%
5
Nucleic Acids Research
1128 papers in training set
Top 3%
6.3%
6
BMC Bioinformatics
383 papers in training set
Top 2%
3.9%
50% of probability mass above
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
3.9%
8
Bioinformatics Advances
184 papers in training set
Top 1%
3.6%
9
Frontiers in Genetics
197 papers in training set
Top 3%
2.7%
10
Scientific Reports
3102 papers in training set
Top 50%
2.1%
11
eLife
5422 papers in training set
Top 36%
2.1%
12
PLOS ONE
4510 papers in training set
Top 49%
2.1%
13
Physical Biology
43 papers in training set
Top 0.9%
1.9%
14
Physical Review E
95 papers in training set
Top 0.6%
1.9%
15
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.7%
16
Journal of Molecular Biology
217 papers in training set
Top 2%
1.5%
17
Nature Communications
4913 papers in training set
Top 55%
1.3%
18
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
19
Genetics
225 papers in training set
Top 3%
1.2%
20
Genome Research
409 papers in training set
Top 3%
0.9%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
22
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.8%
23
Journal of Computational Biology
37 papers in training set
Top 0.6%
0.7%
24
Genome Biology
555 papers in training set
Top 8%
0.7%
25
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 1.0%
0.6%
26
iScience
1063 papers in training set
Top 38%
0.6%