Correlation Between Information Entropy and Functions of Gene Sequences in the Evolutionary Context: A New Way to Construct Gene Regulatory Networks from Sequence

Pan, L.; Chen, M.; Tanik, M.

2026-04-07 bioinformatics

10.64898/2026.04.03.714856 bioRxiv

Show abstract

The information encoded in DNA sequences can be rigorously quantified using Shannon entropy and related measures. When placed in an evolutionary context, this quantification offers a principled yet underexplored route to constructing gene regulatory networks (GRNs) directly from sequence data. While most GRN inference methods rely exclusively on gene expression profiles, the regulatory code is ultimately written in the DNA sequence itself. Here we review the mathematical foundations of information theory as applied to gene sequences, survey existing computational methods for GRN inference--with emphasis on information-theoretic and sequence-based approaches--and examine how evolutionary conservation constrains sequence entropy to preserve biological function. We then propose a four-layer integrative framework that combines per-position Shannon entropy profiles, evolutionary conservation scoring via Jensen- Shannon divergence, expression-based mutual information and transfer entropy, and DNA foundation model embeddings to construct GRNs from sequence. Through worked examples on the Escherichia coli SOS regulatory sub-network, we demonstrate how conservation-weighted mutual information improves edge discrimination and how transfer entropy resolves regulatory directionality. The framework generates testable predictions: edges supported by low-entropy regulatory regions should show higher experimental validation rates, and cross-species entropy profile conservation should predict GRN topology conservation. This work bridges three scales of biological information--nucleotide-level entropy, evolutionary constraint patterns, and network-level regulatory logic--establishing information entropy as the natural mathematical language for sequence-to-network regulatory inference.

Correlation Between Information Entropy and Functions of Gene Sequences in the Evolutionary Context: A New Way to Construct Gene Regulatory Networks from Sequence

Matching journals