Back

Quantification of the effects of single nucleotide variants in NKX2.1 transcription factor binding sites

Lenihan-Geels, F.; Proft, S. A.; Bommer, M.; Heinemann, U.; Seelow, D.; Opitz, R.; Krude, H.; Schuelke, M.; Malecka, M.

2026-02-27 neuroscience
10.64898/2026.02.27.708450 bioRxiv
Show abstract

Transcription factors recognise and bind specific DNA sequence patterns in promoters and enhancers thereby regulating gene expression. Variations in the DNA sequence of transcription factor binding sites (TFBSs) can alter gene regulation and may disrupt development. The transcription factor NKX2.1 is a crucial regulator of thyroid, lung, and neural development. Mutations in its coding gene NKX2-1 may cause choreoathetosis and congenital hypothyroidism with or without pulmonary dysfunction (CAHTP, OMIM #610978). Most genetically solved patients carry mutations in the coding regions of NKX2-1 that affect DNA binding, while the majority of patients with CAHTP-like symptoms do not carry mutations in the NKX2-1 coding sequence. We hypothesise that variations in the DNA-sequence at promoter or enhancer sites to which the transcription factor NKX2.1 binds could cause disease as well. We employed EMSA-seq to quantify the effects of genetic variation on NKX2.1 binding strength and used this data to train neural network models to forecast the influence of DNA variation on NKX2.1 binding. We validated our models using microscale thermophoresis, X-ray crystallography, and publicly available ChIP-seq data sets. The neural networks were able to detect TFBSs in ChIP-seq data and can thus be used to evaluate whole genome sequencing data of CAHTP-patients in order to prioritise potential disease-causing variants in regulatory elements. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/708450v2_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@167cedeorg.highwire.dtl.DTLVardef@3e5291org.highwire.dtl.DTLVardef@19eb7f9org.highwire.dtl.DTLVardef@1404057_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
eLife
5422 papers in training set
Top 2%
15.2%
2
Scientific Reports
3102 papers in training set
Top 11%
7.4%
3
Disease Models & Mechanisms
119 papers in training set
Top 0.2%
5.0%
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
5
PLOS Genetics
756 papers in training set
Top 4%
3.7%
6
iScience
1063 papers in training set
Top 4%
3.7%
7
BMC Medical Genomics
36 papers in training set
Top 0.1%
3.7%
8
Frontiers in Molecular Neuroscience
43 papers in training set
Top 0.1%
3.0%
9
Frontiers in Cellular Neuroscience
79 papers in training set
Top 0.2%
2.8%
10
Communications Biology
886 papers in training set
Top 3%
2.8%
50% of probability mass above
11
Frontiers in Behavioral Neuroscience
46 papers in training set
Top 0.3%
2.1%
12
eBioMedicine
130 papers in training set
Top 0.7%
2.1%
13
International Journal of Molecular Sciences
453 papers in training set
Top 7%
1.7%
14
Frontiers in Endocrinology
53 papers in training set
Top 1%
1.7%
15
Human Genetics and Genomics Advances
70 papers in training set
Top 0.3%
1.7%
16
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
17
Frontiers in Neuroscience
223 papers in training set
Top 4%
1.7%
18
Nature Communications
4913 papers in training set
Top 54%
1.4%
19
Human Molecular Genetics
130 papers in training set
Top 2%
1.4%
20
The American Journal of Human Genetics
206 papers in training set
Top 3%
1.3%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
0.9%
22
PLOS Biology
408 papers in training set
Top 17%
0.9%
23
PLOS ONE
4510 papers in training set
Top 65%
0.8%
24
Molecular Brain
26 papers in training set
Top 0.2%
0.8%
25
Genome Research
409 papers in training set
Top 4%
0.8%
26
Bioinformatics
1061 papers in training set
Top 9%
0.8%
27
eneuro
389 papers in training set
Top 9%
0.8%
28
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.8%
29
Cell Genomics
162 papers in training set
Top 6%
0.7%
30
Biology
43 papers in training set
Top 3%
0.7%