Back

From motif to function:Inferring the functions of long zinc finger proteins through combinatorial selection

Zuo, Z.

2022-04-20 bioinformatics
10.1101/2022.04.19.488842 bioRxiv
Show abstract

C2H2 zinc finger proteins (ZFPs) comprise of the largest group of DNA-binding proteins in human genome, and many of them contain long, tandem array of fingers, making the motif discovery, prediction of in vivo cis-regulatory elements (CREs), and understanding their functions particularly challenging. Previous work established that due to the dependent recognition between sub-motifs, the simple, additive recognition model impedes motif discovery and compromises our understanding about how ZFPs work. This work uses ZFP3, a 13-finger long ZFP with no known function, as case example to address the reverse question---given the full-length motif learned through in vitro experiments, like Spec-seq and HT-SELEX, how to reliably identify its in vivo cis-regulatory elements (CREs) and further predict this genes functions. Through sorting of all possible sites within the ChIP-seq peaks with similar predicted binding energy into groups and comparing the aggregate ChIP-seq signals between groups, it is evident that either its full-length or individual sub-motif alone fails to correctly identify all high-affinity specific sites without false-positives, thus it is necessary to revise current algorithm, and use both the core and upstream motifs as separate components to improve the prediction accuracy. Furthermore, significant number of regulatory elements of ZFP3 are found to be proximal to genes associated with microtubules organization and ciliogenesis, which coincides with the fact that ZFP3 is specifically upregulated in multiple ciliated cells. At last, local chromatin accessibility and active chromatin marks like H3K27ac are found to positively associate with the differential binding of ZFP3 between tested cell lines. Overall, this work establishes a novel "From motif to function" strategy for long ZFPs, and the data analysis workflows are implemented through R package TFCookbook for reuse onto other ZFPs.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 3%
10.6%
2
PLOS ONE
4510 papers in training set
Top 18%
10.3%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.1%
10.3%
4
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.9%
7.0%
5
Scientific Reports
3102 papers in training set
Top 22%
4.9%
6
PeerJ
261 papers in training set
Top 3%
3.1%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
2.8%
8
Frontiers in Genetics
197 papers in training set
Top 3%
2.8%
50% of probability mass above
9
Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms
14 papers in training set
Top 0.1%
2.7%
10
Genes
126 papers in training set
Top 0.6%
2.1%
11
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
1.9%
12
BMC Bioinformatics
383 papers in training set
Top 4%
1.9%
13
Bioinformatics
1061 papers in training set
Top 7%
1.7%
14
Journal of Molecular Biology
217 papers in training set
Top 2%
1.4%
15
BMC Genomic Data
12 papers in training set
Top 0.1%
1.2%
16
Journal of Proteome Research
215 papers in training set
Top 1%
1.2%
17
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.2%
18
Gene
41 papers in training set
Top 1%
1.2%
19
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.3%
1.2%
20
Heliyon
146 papers in training set
Top 4%
1.1%
21
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.4%
1.0%
22
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
23
Genomics
60 papers in training set
Top 2%
0.9%
24
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.8%
25
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
26
Molecular Omics
21 papers in training set
Top 0.3%
0.8%
27
Frontiers in Immunology
586 papers in training set
Top 8%
0.7%
28
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 10%
0.7%
29
Biosystems
18 papers in training set
Top 0.6%
0.5%
30
Physical Biology
43 papers in training set
Top 3%
0.5%