Back

Learning the histone codes of gene regulation with large genomic windows and three-dimensional chromatin interactions using transformer

Lee, D.; Yang, J.; Kim, S.

2021-12-30 genomics
10.1101/2021.12.30.472333 bioRxiv
Show abstract

The quantitative characterization of the transcriptional control by histone modifications (HMs) has been challenged by many computational studies, but still most of them exploit only partial aspects of intricate mechanisms involved in gene regulation, leaving a room for improvement. We present Chromoformer, a new transformer-based deep learning architecture that achieves the state-of-the-art performance in the quantitative deciphering of the histone codes of gene regulation. The core essence of Chromoformer architecture lies in the three variants of attention operation, each specialized to model individual hierarchy of three-dimensional (3D) transcriptional regulation including (1) histone codes at core promoters, (2) pairwise interaction between a core promoter and a distal cis-regulatory element mediated by 3D chromatin interactions, and (3) the collective effect of the pairwise cis-regulations. In-depth interpretation of the trained model behavior based on attention scores suggests that Chromoformer adaptively exploits the distant dependencies between HMs associated with transcription initiation and elongation. We also demonstrate that the quantitative kinetics of transcription factories and polycomb group bodies, in which the coordinated gene regulation occurs through spatial sequestration of genes with regulatory elements, can be captured by Chromoformer. Together, our study shows the great power of attention-based deep learning as a versatile modeling approach for the complex epigenetic landscape of gene regulation and highlights its potential as an effective toolkit that facilitates scientific discoveries in computational epigenetics.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.1%
23.6%
2
Bioinformatics
1061 papers in training set
Top 2%
18.3%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
7.1%
4
Frontiers in Genetics
197 papers in training set
Top 1.0%
5.1%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 38%
3.8%
6
iScience
1063 papers in training set
Top 6%
3.2%
7
Scientific Reports
3102 papers in training set
Top 40%
3.2%
8
Nucleic Acids Research
1128 papers in training set
Top 7%
2.9%
9
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.2%
10
Genome Research
409 papers in training set
Top 2%
2.0%
11
Nature Machine Intelligence
61 papers in training set
Top 2%
1.9%
12
Bioinformatics Advances
184 papers in training set
Top 3%
1.8%
13
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.4%
14
eLife
5422 papers in training set
Top 46%
1.4%
15
Epigenetics
43 papers in training set
Top 0.7%
0.9%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
17
Communications Biology
886 papers in training set
Top 17%
0.9%
18
Cell Genomics
162 papers in training set
Top 6%
0.8%
19
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
20
Genome Biology
555 papers in training set
Top 7%
0.8%
21
Advanced Science
249 papers in training set
Top 18%
0.8%
22
Patterns
70 papers in training set
Top 2%
0.8%
23
Heliyon
146 papers in training set
Top 6%
0.8%
24
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
25
Nature Computational Science
50 papers in training set
Top 2%
0.8%
26
PLOS ONE
4510 papers in training set
Top 67%
0.8%
27
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
28
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 9%
0.8%
29
Cell Reports
1338 papers in training set
Top 36%
0.5%
30
Genomics
60 papers in training set
Top 3%
0.5%