From motif to function:Inferring the functions of long zinc finger proteins through combinatorial selection
Zuo, Z.
Show abstract
C2H2 zinc finger proteins (ZFPs) comprise of the largest group of DNA-binding proteins in human genome, and many of them contain long, tandem array of fingers, making the motif discovery, prediction of in vivo cis-regulatory elements (CREs), and understanding their functions particularly challenging. Previous work established that due to the dependent recognition between sub-motifs, the simple, additive recognition model impedes motif discovery and compromises our understanding about how ZFPs work. This work uses ZFP3, a 13-finger long ZFP with no known function, as case example to address the reverse question---given the full-length motif learned through in vitro experiments, like Spec-seq and HT-SELEX, how to reliably identify its in vivo cis-regulatory elements (CREs) and further predict this genes functions. Through sorting of all possible sites within the ChIP-seq peaks with similar predicted binding energy into groups and comparing the aggregate ChIP-seq signals between groups, it is evident that either its full-length or individual sub-motif alone fails to correctly identify all high-affinity specific sites without false-positives, thus it is necessary to revise current algorithm, and use both the core and upstream motifs as separate components to improve the prediction accuracy. Furthermore, significant number of regulatory elements of ZFP3 are found to be proximal to genes associated with microtubules organization and ciliogenesis, which coincides with the fact that ZFP3 is specifically upregulated in multiple ciliated cells. At last, local chromatin accessibility and active chromatin marks like H3K27ac are found to positively associate with the differential binding of ZFP3 between tested cell lines. Overall, this work establishes a novel "From motif to function" strategy for long ZFPs, and the data analysis workflows are implemented through R package TFCookbook for reuse onto other ZFPs.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.