Back

Decoding TF-Specific Predictability in Cross-Species Binding Site Inference

Wang, Y.; Liu, G.; Wang, Y.; Zhang, Y.

2026-04-16 genomics
10.64898/2026.04.14.718438 bioRxiv
Show abstract

Accurately identifying transcription factor (TF) binding sites across species is essential for understanding conserved gene regulatory mechanisms. While experimental techniques such as ChIP-seq have enabled genome-wide TF-binding maps, their application is often constrained by the limited availability of high-quality antibodies. Computational approaches that leverage data from one species to predict TF-binding sites in other species have emerged as valuable alternatives. However, existing models often rely on uniform modeling assumptions, overlooking substantial variability in cross-species predictability across TFs. In this study, we systematically evaluated the cross-species predictability of 137 TFs using 425 human-mouse ChIP-seq dataset pairs matched by cell type, and identified key biological features underlying this variability. Building on these insights, we developed ChromTransfer, a TF-aware cross-species prediction framework that integrates DNA sequence, functional conservation, TF-specific co-binding signals, and shared chromatin context signals. These regulatory signals substantially improve prediction performance, particularly for TFs with weak or absent motif enrichment. Together, this study establishes a biologically informed and scalable framework for TF-specific cross-species TF-binding site prediction and provides a practical strategy for extending regulatory annotations across species.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.3%
12.0%
2
Genome Research
409 papers in training set
Top 0.2%
8.9%
3
Nature Communications
4913 papers in training set
Top 24%
8.2%
4
Nucleic Acids Research
1128 papers in training set
Top 2%
8.2%
5
Cell Genomics
162 papers in training set
Top 0.7%
6.1%
6
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
4.0%
7
PLOS Computational Biology
1633 papers in training set
Top 9%
3.8%
50% of probability mass above
8
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.8%
3.5%
9
Bioinformatics
1061 papers in training set
Top 6%
3.5%
10
Cell Systems
167 papers in training set
Top 4%
3.5%
11
The American Journal of Human Genetics
206 papers in training set
Top 2%
2.3%
12
Nature Genetics
240 papers in training set
Top 4%
2.0%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
14
Nature Methods
336 papers in training set
Top 4%
1.7%
15
Nature Computational Science
50 papers in training set
Top 0.7%
1.6%
16
Genome Medicine
154 papers in training set
Top 5%
1.6%
17
Nature Machine Intelligence
61 papers in training set
Top 2%
1.6%
18
Nature Biotechnology
147 papers in training set
Top 5%
1.6%
19
eLife
5422 papers in training set
Top 44%
1.6%
20
Science
429 papers in training set
Top 15%
1.6%
21
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.6%
22
Cell Reports
1338 papers in training set
Top 26%
1.4%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
24
Communications Biology
886 papers in training set
Top 20%
0.9%
25
PLOS ONE
4510 papers in training set
Top 67%
0.8%
26
Molecular Cell
308 papers in training set
Top 10%
0.8%
27
Scientific Reports
3102 papers in training set
Top 76%
0.7%
28
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
29
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
30
Journal of Genetics and Genomics
36 papers in training set
Top 3%
0.6%