Back

TFBSpedia: a comprehensive human and mouse transcription factor binding sites database

Li, S.; Chou, E.; Wang, K.; Boyle, A. P.; Sartor, M. A.

2026-03-06 bioinformatics
10.64898/2026.03.04.709638 bioRxiv
Show abstract

Mapping the genomic locations and patterns of transcription factor binding sites (TFBS) is essential for understanding gene regulation and advancing treatments for diseases driven by DNA modifications, including epigenetic changes and sequence variants. Although several TFBS databases exist, no study has systematically benchmarked these databases across different sequencing technologies and computational algorithms. In this study, we addressed this gap by constructing a TFBS database that integrates all available ENCODE cell line ATAC-seq and Cistrome Data Browser ChIP-seq datasets, comprising 11.3 million human and 1.87 million mouse TFBS. We also integrated previously published TFBS resources (Factorbook, Unibind, RegulomeDB, and ENCODE_footprint) and found each contains a substantial fraction of unique TFBS predictions, highlighting significant discrepancies among existing resources. To assess the accuracy of the combined TFBS regions, we assembled ten independent genomic annotation datasets for evaluation and found that TFBS regions predicted by multiple databases are more likely to represent true and biologically meaningful binding sites. For each predicted TFBS region, we define two scores: the confidence score reflects prediction reliability, while the importance score represents biological functional relevance. Finally, we introduce TFBSpedia, a lightweight and efficient search engine that enables rapid retrieval of TFBS regions and comprehensive annotation information across the integrated databases.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.3%
14.5%
2
Bioinformatics
1061 papers in training set
Top 2%
14.2%
3
Nucleic Acids Research
1128 papers in training set
Top 1%
12.2%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.7%
5
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.1%
50% of probability mass above
6
Frontiers in Genetics
197 papers in training set
Top 2%
3.9%
7
Database
51 papers in training set
Top 0.2%
3.5%
8
Bioinformatics Advances
184 papers in training set
Top 2%
3.0%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.3%
10
Genome Biology
555 papers in training set
Top 3%
2.0%
11
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.0%
12
BMC Bioinformatics
383 papers in training set
Top 4%
2.0%
13
Journal of Molecular Biology
217 papers in training set
Top 1%
1.9%
14
PLOS ONE
4510 papers in training set
Top 51%
1.9%
15
Scientific Reports
3102 papers in training set
Top 54%
1.9%
16
Genome Research
409 papers in training set
Top 2%
1.8%
17
Journal of Genetics and Genomics
36 papers in training set
Top 1.0%
1.7%
18
Genome Medicine
154 papers in training set
Top 6%
1.2%
19
GigaScience
172 papers in training set
Top 3%
0.9%
20
Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms
14 papers in training set
Top 0.1%
0.8%
21
Molecular Plant
36 papers in training set
Top 1%
0.7%
22
Cell Genomics
162 papers in training set
Top 7%
0.7%
23
BMC Genomics
328 papers in training set
Top 6%
0.7%
24
Nature Communications
4913 papers in training set
Top 64%
0.7%
25
Cell Systems
167 papers in training set
Top 14%
0.6%