Back

EpiTADformer: A Transformer-Based Model for High-Resolution TAD Boundary Detection Using Epigenomic Signal Embeddings

Nguyen, M.; Tang, S.; McClay, J. L.; Harrell, J. C.; Dozmorov, M. G.

2026-01-22 bioinformatics
10.64898/2026.01.20.700691 bioRxiv
Show abstract

The human genome is partitioned at different levels of 3D genome organization, with topologically associating domains (TADs) being among the most well-known and biologically important structures. TAD boundary disruption is associated with a wide range of diseases such as cancer, neurological and developmental disorders. Numerous methods have been developed to detect TAD boundaries from chromatin contact maps obtained with Hi-C technology. However, these methods are largely limited by the resolution of Hi-C data, typically 1 Kb to 100 Kb. In contrast, functional DNA loci, collectively referred to as epigenomic data, are profiled at a much higher resolution (100-200 bp for a typical ChIP-seq experiment). To improve the resolution of boundary detection, we hypothesize that the patterns of epigenomic signals associated with regions in proximity to TAD boundaries can serve as embeddings for these genomic regions, defining region similarity. These embeddings, along with their positional relationships, can be effectively modeled using deep learning to achieve more precise boundary prediction. We present EpiTADformer, a transformer-based model that takes as input transcriptional and histone modification signals of neighboring regions centered around TAD boundaries. We demonstrate that EpiTADformer outperforms feedforward neural network, convolutional neural network (CNN), and bidirectional long short-term memory (BiLSTM) network architectures. These results suggest the positional information of epigenomic signals surrounding TAD boundaries provides a strong predictive signal, enabling improved performance of the transformer model. Our findings highlight the potential of epigenomic signals to serve as region embeddings for refining the epigenomic language of TAD domains and 3D genome organization.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.8%
28.0%
2
PLOS Computational Biology
1633 papers in training set
Top 2%
12.5%
3
BMC Bioinformatics
383 papers in training set
Top 1%
7.3%
4
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.5%
4.2%
50% of probability mass above
5
Nucleic Acids Research
1128 papers in training set
Top 5%
3.9%
6
Nature Communications
4913 papers in training set
Top 38%
3.7%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
8
Scientific Reports
3102 papers in training set
Top 45%
2.6%
9
Bioinformatics Advances
184 papers in training set
Top 2%
2.4%
10
Frontiers in Genetics
197 papers in training set
Top 3%
2.4%
11
Genome Biology
555 papers in training set
Top 4%
1.8%
12
Genome Research
409 papers in training set
Top 2%
1.7%
13
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
14
PLOS ONE
4510 papers in training set
Top 53%
1.7%
15
Communications Biology
886 papers in training set
Top 9%
1.7%
16
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.5%
17
Advanced Science
249 papers in training set
Top 12%
1.5%
18
iScience
1063 papers in training set
Top 17%
1.5%
19
Epigenetics & Chromatin
42 papers in training set
Top 0.2%
1.3%
20
Cell Systems
167 papers in training set
Top 12%
0.7%
21
BioData Mining
15 papers in training set
Top 1.0%
0.7%
22
Genome Medicine
154 papers in training set
Top 9%
0.7%
23
Patterns
70 papers in training set
Top 3%
0.7%
24
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 8%
0.5%
25
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.5%
26
The American Journal of Human Genetics
206 papers in training set
Top 5%
0.5%
27
GigaScience
172 papers in training set
Top 4%
0.5%
28
Database
51 papers in training set
Top 1%
0.5%