Back

Discovering conserved regulatory modules in predicted gene regulatory networks across species

Zhang, J.; Heath, L. S.

2026-05-16 systems biology
10.64898/2026.05.15.725337 bioRxiv
Show abstract

The discovery of conserved regulatory motifs across different species is a fundamental challenge in systems biology, especially considering the noisy and incomplete nature of predicted gene regulatory networks (GRNs) and the intractability of the underlying graph alignment problem. Traditional network alignment methods frequently enforce one-to-one node mappings or strict topological isomorphism, which fail to accommodate the many-to-many orthology mappings caused by evolutionary gene duplication. Consequently, strict constraints often yield highly fragmented topological islands rather than cohesive functional modules. In this work, we propose a relaxed topological alignment algorithm designed to extract conserved regulatory structures from cross-species GRNs. We formulate the discovery process as a multi-objective optimization problem that balances sequence homology, functional coherence, and a normalized topological consensus. To navigate the exponentially scaling search space, we introduce a greedy seed-and-extend heuristic bounded by a dynamic{epsilon} -stopping condition, which evaluates marginal objective gains to prevent functional dilution. We validate our algorithm using time-series transcriptomic data from Arabidopsis thaliana, Zea mays, and Sorghum bicolor focused on drought and developmental stress responses. While a strict topological baseline extracted only fragmented subgraphs limited to 51 homologous tuples, our relaxed heuristic successfully converged on a highly connected 444-tuple module. The resulting topology effectively links strictly conserved upstream transcription factors to their highly duplicated, species-specific downstream pathways. Our algorithm provides a robust, scalable computational methodology for identifying core regulatory logic across complex biological systems, facilitating the translation of conserved network architectures among multiple species. Author summaryIdentifying shared regulatory mechanisms across diverse species is essential for understanding how complex biological systems evolve and adapt. However, traditional computer algorithms struggle to align these biological networks because evolution frequently duplicates genes, breaking simple one-to-one comparisons and producing highly fragmented results. To overcome this limitation, we developed a relaxed cross-species network alignment algorithm. Instead of demanding perfectly identical network shapes, our approach dynamically balances genetic sequence similarity, network structure, and biological function. We demonstrated the performance of our algorithm using plant drought-stress networks as a case study. While strict methods only found tiny, disconnected network fragments, our algorithm uncovered a functionally coherent, interconnected regulatory module across three distinct species. We discovered that while upstream command genes remain strictly conserved, they regulate highly customized, species-specific execution pathways downstream. Ultimately, our framework provides a scalable, species-agnostic method to decode complex systems, allowing researchers to translate conserved biological logic across diverse genomes.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics Advances
184 papers in training set
Top 0.1%
26.5%
2
Bioinformatics
1061 papers in training set
Top 1%
19.1%
3
PLOS Computational Biology
1633 papers in training set
Top 3%
10.3%
50% of probability mass above
4
BMC Bioinformatics
383 papers in training set
Top 1.0%
9.4%
5
Cell Systems
167 papers in training set
Top 1%
9.4%
6
Genome Research
409 papers in training set
Top 1%
3.0%
7
Development
440 papers in training set
Top 1%
1.8%
8
iScience
1063 papers in training set
Top 19%
1.4%
9
PLOS ONE
4510 papers in training set
Top 59%
1.3%
10
Nucleic Acids Research
1128 papers in training set
Top 14%
1.1%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.1%
12
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.9%
13
eLife
5422 papers in training set
Top 55%
0.8%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
15
Genome Biology
555 papers in training set
Top 7%
0.8%
16
Plant Physiology
217 papers in training set
Top 3%
0.7%
17
Nature Communications
4913 papers in training set
Top 64%
0.7%
18
Plant Communications
35 papers in training set
Top 2%
0.7%
19
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
20
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
21
Cell Reports
1338 papers in training set
Top 37%
0.5%
22
Scientific Reports
3102 papers in training set
Top 79%
0.5%