Deep Learning of High-throughput Transcription Factor-DNA Binding Affinity Data: Quantitative Comparison with Pairwise-Additive Models
Shen, K.; Wang, Z.; Xie, X. S.
Show abstract
Transcription factors (TFs) regulate gene expression by binding to specific DNA sequences. Widely used models of TF-DNA binding, such as position weight matrices (PWMs) and position-specific affinity matrices (PSAMs), assume binding free energy is the sum of independent base contributions. However, there is ample evidence that non-additive effects significantly influence TF binding. Here, we utilize data from a high-throughput in vitro assay (ivtFOODIE) to generate genome-scale TF-DNA dissociation constants (Kd) and systematically evaluate sequence-to-affinity models of increasing complexity. We demonstrate that pairwise additive models exhibit systematic deviations from the measured affinity landscapes. Models incorporating adjacent dinucleotide interactions and deep learning architectures achieve markedly improved agreement with experimental Kd values. The magnitude of this non-pairwise-additivity depends strongly on the TF family. In silico mutation screening reveals widespread, TF-specific long-range interposition dependencies, highlighting the role of energetic coupling across distant positions in target recognition. These results provide a quantitative framework for comparing non-pairwise-additive energetic effects across diverse TFs.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.