Back

A deep learning model recapitulates position specific effects of plant regulatory sequences and suggests genes under complex regulation

Rockenbach, K. C.; Zanini, S. F.; Morris, R. J.; Wells, R. J.; Golicz, A. A.

2025-09-04 bioinformatics
10.1101/2025.08.30.673246 bioRxiv
Show abstract

Deep neural networks can be trained to predict gene expression directly from genomic sequence, thereby implicitly learning regulatory sequence patterns from scratch, minimizing the bias imposed by prior assumptions. A challenging, yet promising prospect is the extraction of novel insights into gene-regulatory mechanisms, by probing and interpreting such gene expression models. Using a branched convolutional neural network architecture trained on promoter and terminator sequences we predict gene expression for allopolyploid Brassica napus and the closely related model organism Arabidopsis thaliana. We validate the model by comparing predicted and measured expression across ecotypes. We also show that deep learning models can successfully capture the positional binding preferences of some transcription factor families, without having been trained on transcription factor binding data. Furthermore, we show that our model did not only detect local sequence patterns, but was also able to determine their function based on their positional context. We also found that increased prediction error correlated with additional more distal or epigenetic regulatory input. Our results demonstrate that deep learning can be used to understand the regulatory architecture of gene expression in plants. A better understanding of gene regulation in the context of polyploid genomes is of particular economic importance, due to their prevalence among major crops. In the future, we hope that such models may facilitate the targeted engineering of gene regulation in crops.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
in silico Plants
24 papers in training set
Top 0.1%
40.0%
2
Plant Physiology
217 papers in training set
Top 0.6%
6.9%
3
Frontiers in Genetics
197 papers in training set
Top 1%
4.2%
50% of probability mass above
4
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
5
Scientific Reports
3102 papers in training set
Top 35%
3.6%
6
Plant Phenomics
17 papers in training set
Top 0.1%
3.6%
7
Plant Direct
81 papers in training set
Top 0.7%
2.8%
8
PLOS ONE
4510 papers in training set
Top 53%
1.7%
9
Genetics
225 papers in training set
Top 2%
1.7%
10
Bioinformatics
1061 papers in training set
Top 7%
1.7%
11
New Phytologist
309 papers in training set
Top 3%
1.7%
12
The Plant Genome
53 papers in training set
Top 0.4%
1.7%
13
Frontiers in Plant Science
240 papers in training set
Top 4%
1.4%
14
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.4%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.4%
16
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.2%
17
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.3%
1.1%
18
BMC Bioinformatics
383 papers in training set
Top 6%
1.1%
19
Development
440 papers in training set
Top 3%
1.0%
20
Plant Biotechnology Journal
56 papers in training set
Top 1.0%
0.9%
21
The Plant Journal
197 papers in training set
Top 3%
0.8%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
23
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
24
Applications in Plant Sciences
21 papers in training set
Top 0.3%
0.7%
25
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
26
The Plant Phenome Journal
14 papers in training set
Top 0.3%
0.7%
27
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.8%
0.7%
28
Journal of Genetics and Genomics
36 papers in training set
Top 3%
0.7%
29
Molecular Biology and Evolution
488 papers in training set
Top 5%
0.7%
30
Horticulture Research
43 papers in training set
Top 2%
0.5%