Back

Deep Learning of High-throughput Transcription Factor-DNA Binding Affinity Data: Quantitative Comparison with Pairwise-Additive Models

Shen, K.; Wang, Z.; Xie, X. S.

2026-05-19 biophysics
10.64898/2026.05.18.725888 bioRxiv
Show abstract

Transcription factors (TFs) regulate gene expression by binding to specific DNA sequences. Widely used models of TF-DNA binding, such as position weight matrices (PWMs) and position-specific affinity matrices (PSAMs), assume binding free energy is the sum of independent base contributions. However, there is ample evidence that non-additive effects significantly influence TF binding. Here, we utilize data from a high-throughput in vitro assay (ivtFOODIE) to generate genome-scale TF-DNA dissociation constants (Kd) and systematically evaluate sequence-to-affinity models of increasing complexity. We demonstrate that pairwise additive models exhibit systematic deviations from the measured affinity landscapes. Models incorporating adjacent dinucleotide interactions and deep learning architectures achieve markedly improved agreement with experimental Kd values. The magnitude of this non-pairwise-additivity depends strongly on the TF family. In silico mutation screening reveals widespread, TF-specific long-range interposition dependencies, highlighting the role of energetic coupling across distant positions in target recognition. These results provide a quantitative framework for comparing non-pairwise-additive energetic effects across diverse TFs.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 0.2%
27.6%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.1%
3
Nature Communications
4913 papers in training set
Top 18%
10.1%
4
Genome Biology
555 papers in training set
Top 1%
4.8%
50% of probability mass above
5
Scientific Reports
3102 papers in training set
Top 31%
4.0%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 20%
3.6%
7
Cell Systems
167 papers in training set
Top 4%
3.6%
8
eLife
5422 papers in training set
Top 29%
3.1%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.9%
10
Biophysical Journal
545 papers in training set
Top 3%
1.7%
11
Cell Reports
1338 papers in training set
Top 25%
1.7%
12
Science Advances
1098 papers in training set
Top 20%
1.5%
13
Communications Biology
886 papers in training set
Top 11%
1.5%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
15
PLOS ONE
4510 papers in training set
Top 58%
1.3%
16
Bioinformatics
1061 papers in training set
Top 8%
1.2%
17
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.1%
18
Physical Biology
43 papers in training set
Top 2%
0.9%
19
iScience
1063 papers in training set
Top 26%
0.9%
20
Biophysical Reports
36 papers in training set
Top 0.5%
0.7%
21
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
22
Advanced Science
249 papers in training set
Top 21%
0.7%
23
Nano Letters
63 papers in training set
Top 3%
0.6%
24
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.6%
25
Nature Methods
336 papers in training set
Top 7%
0.6%
26
New Phytologist
309 papers in training set
Top 5%
0.6%