EpiTADformer: A Transformer-Based Model for High-Resolution TAD Boundary Detection Using Epigenomic Signal Embeddings
Nguyen, M.; Tang, S.; McClay, J. L.; Harrell, J. C.; Dozmorov, M. G.
Show abstract
The human genome is partitioned at different levels of 3D genome organization, with topologically associating domains (TADs) being among the most well-known and biologically important structures. TAD boundary disruption is associated with a wide range of diseases such as cancer, neurological and developmental disorders. Numerous methods have been developed to detect TAD boundaries from chromatin contact maps obtained with Hi-C technology. However, these methods are largely limited by the resolution of Hi-C data, typically 1 Kb to 100 Kb. In contrast, functional DNA loci, collectively referred to as epigenomic data, are profiled at a much higher resolution (100-200 bp for a typical ChIP-seq experiment). To improve the resolution of boundary detection, we hypothesize that the patterns of epigenomic signals associated with regions in proximity to TAD boundaries can serve as embeddings for these genomic regions, defining region similarity. These embeddings, along with their positional relationships, can be effectively modeled using deep learning to achieve more precise boundary prediction. We present EpiTADformer, a transformer-based model that takes as input transcriptional and histone modification signals of neighboring regions centered around TAD boundaries. We demonstrate that EpiTADformer outperforms feedforward neural network, convolutional neural network (CNN), and bidirectional long short-term memory (BiLSTM) network architectures. These results suggest the positional information of epigenomic signals surrounding TAD boundaries provides a strong predictive signal, enabling improved performance of the transformer model. Our findings highlight the potential of epigenomic signals to serve as region embeddings for refining the epigenomic language of TAD domains and 3D genome organization.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.