Back

Deep-Plant: a supervised foundation model for plant regulatory genomics

Daoud, A.; Roy, S.; Zeng, H.; Bao, X.; Zhang, Z.; Wang, J.; Parodi, P.; Reddy, A.; Liu, J.; Ben-Hur, A.

2026-04-09 genomics
10.64898/2026.04.06.716755 bioRxiv
Show abstract

Large-scale sequence-to-function deep learning models have demonstrated unparalleled ability to model biological sequences and have revolutionized the field of regulatory genomics. However, the majority of such efforts have centered on human and mammalian systems, leaving plant regulatory genomics comparatively underexplored. To address this gap, we introduce DO_SCPLOWEEPC_SCPLOW-PO_SCPLOWLANTC_SCPLOW, a supervised foundation model trained to predict chromatin state directly from genomic sequence. In contrast to large language models, which are trained in a selfsupervised manner using sequence alone, our model is trained to predict chromatin state across tissues and conditions. Training the model on a large collection of genome-wide experiments including DNA accessibility, transcription factor binding, and histone modifications, provides it with added biological context beyond the sequence itself. We demonstrate that the resulting model is an effective platform for developing accurate models of regulatory activity relevant to gene expression and active enhancers, exhibiting large improvements in speed, accuracy, and interpretability over the complementary approach of fine-tuning DNA language models. DO_SCPLOWEEPC_SCPLOW-PO_SCPLOWLANTC_SCPLOW models are available in Arabidopsis and rice, and work well as a building block for sequence modeling in related species such as corn. Together, these results establish supervised, chromatin-informed foundation models as a practical and effective paradigm for regulatory sequence modeling in plants.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
14.2%
2
Nucleic Acids Research
1128 papers in training set
Top 3%
6.7%
3
Nature Communications
4913 papers in training set
Top 29%
6.3%
4
Genome Research
409 papers in training set
Top 0.6%
4.8%
5
Nature Genetics
240 papers in training set
Top 2%
4.8%
6
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.3%
7
Bioinformatics
1061 papers in training set
Top 5%
4.3%
8
Cell Systems
167 papers in training set
Top 3%
3.9%
9
Cell Genomics
162 papers in training set
Top 1%
3.9%
50% of probability mass above
10
Nature Methods
336 papers in training set
Top 3%
3.5%
11
Nature
575 papers in training set
Top 7%
3.5%
12
Nature Biotechnology
147 papers in training set
Top 3%
3.2%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.4%
14
Genome Medicine
154 papers in training set
Top 3%
2.1%
15
PLOS Computational Biology
1633 papers in training set
Top 14%
2.1%
16
Nature Plants
84 papers in training set
Top 1.0%
1.8%
17
Bioinformatics Advances
184 papers in training set
Top 3%
1.8%
18
Science
429 papers in training set
Top 13%
1.8%
19
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
21
Nature Computational Science
50 papers in training set
Top 0.6%
1.7%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
23
The American Journal of Human Genetics
206 papers in training set
Top 3%
1.2%
24
Cell Reports
1338 papers in training set
Top 30%
0.9%
25
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
26
PLOS ONE
4510 papers in training set
Top 66%
0.8%
27
Scientific Reports
3102 papers in training set
Top 73%
0.8%
28
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.6%
0.8%
29
Communications Biology
886 papers in training set
Top 27%
0.7%
30
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%