Deep-Plant: a supervised foundation model for plant regulatory genomics
Daoud, A.; Roy, S.; Zeng, H.; Bao, X.; Zhang, Z.; Wang, J.; Parodi, P.; Reddy, A.; Liu, J.; Ben-Hur, A.
Show abstract
Large-scale sequence-to-function deep learning models have demonstrated unparalleled ability to model biological sequences and have revolutionized the field of regulatory genomics. However, the majority of such efforts have centered on human and mammalian systems, leaving plant regulatory genomics comparatively underexplored. To address this gap, we introduce DO_SCPLOWEEPC_SCPLOW-PO_SCPLOWLANTC_SCPLOW, a supervised foundation model trained to predict chromatin state directly from genomic sequence. In contrast to large language models, which are trained in a selfsupervised manner using sequence alone, our model is trained to predict chromatin state across tissues and conditions. Training the model on a large collection of genome-wide experiments including DNA accessibility, transcription factor binding, and histone modifications, provides it with added biological context beyond the sequence itself. We demonstrate that the resulting model is an effective platform for developing accurate models of regulatory activity relevant to gene expression and active enhancers, exhibiting large improvements in speed, accuracy, and interpretability over the complementary approach of fine-tuning DNA language models. DO_SCPLOWEEPC_SCPLOW-PO_SCPLOWLANTC_SCPLOW models are available in Arabidopsis and rice, and work well as a building block for sequence modeling in related species such as corn. Together, these results establish supervised, chromatin-informed foundation models as a practical and effective paradigm for regulatory sequence modeling in plants.
Matching journals
The top 9 journals account for 50% of the predicted probability mass.