Back

Deep analysis of FANTOM CAGE data reveals hierarchical patterns of TSS co-deployment hubs and their disruption in cancers

Meduri, R.; Satish, A. L.; Singh, U.

2026-05-18 genomics
10.64898/2026.05.15.725323 bioRxiv
Show abstract

Selective deployment of multiple transcription start sites is a major regulatory feature of human transcriptomes. FANTOM CAGE data exhibit a near-universal TSS deployment parsimony which is disrupted in cancers. We have recently shown that TSS deployment is sensitive to gene function, futile upstream transcription, and cellular biosynthetic states. Patterns in FANTOM CAGE data can reveal mechanisms underlying TSS co-deployments. We propose and test the possibility that some TSSs act like epromoters and act as co-varying hubs of transcriptional activities for multiple other promoters. Using deep analysis of CAGE data implemented through neural networks we show that non-cancers implement transcription co-deployments through cores of epromoter-like TSSs which are generally proximal to their start codons. These TSSs show enhancer-like TFBSs profiles. A comparison with cancer CAGE data shows that the concentrated epromoter core is disrupted in cancers with multiple distal TSSs replacing the proximal TSS cores. We provide evidence that the core TSSs are rich in YY1 and CTCF binding sites and associated with genes coding for transcription factors. Our findings show that covariance of TSS deployment is sensitive to transcriptional resource cost and a parsimonic design of TSS co-deployments depends on proximal TSSs in non-cancers, a mechanism grossly disrupted in cancers. HighlightsO_LIHeterogeneous FANTOM CAGE data contains universal patterns of TSSs co-deployments. C_LIO_LITSS co-deployments exhibit a parsimonious "core-covariant" scheme which is disrupted in cancers. C_LIO_LICore TSSs are enriched in transcription factor binding sites and gene functions which justify biological features of the samples. C_LIO_LIThe DL pipeline we present identifies the core-covariant TSS sets in an unbiased manner. C_LI

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.3%
28.0%
2
Frontiers in Genetics
197 papers in training set
Top 0.2%
10.6%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.2%
9.3%
4
Bioinformatics
1061 papers in training set
Top 4%
4.9%
50% of probability mass above
5
Scientific Reports
3102 papers in training set
Top 27%
4.4%
6
BMC Bioinformatics
383 papers in training set
Top 3%
3.1%
7
Nucleic Acids Research
1128 papers in training set
Top 7%
2.8%
8
Genome Research
409 papers in training set
Top 1%
2.8%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.1%
10
Heliyon
146 papers in training set
Top 1%
1.8%
11
PLOS ONE
4510 papers in training set
Top 53%
1.7%
12
iScience
1063 papers in training set
Top 14%
1.7%
13
PLOS Genetics
756 papers in training set
Top 8%
1.7%
14
Genomics
60 papers in training set
Top 1.0%
1.7%
15
Genome Biology
555 papers in training set
Top 4%
1.7%
16
Epigenetics & Chromatin
42 papers in training set
Top 0.2%
1.5%
17
F1000Research
79 papers in training set
Top 3%
1.2%
18
Physical Review E
95 papers in training set
Top 1%
0.8%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
20
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
21
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
22
GigaScience
172 papers in training set
Top 3%
0.8%
23
Nature Communications
4913 papers in training set
Top 63%
0.8%
24
BMC Genomics
328 papers in training set
Top 5%
0.8%
25
Genes
126 papers in training set
Top 3%
0.7%
26
Molecular Genetics and Genomics
11 papers in training set
Top 0.4%
0.7%
27
International Journal of Molecular Sciences
453 papers in training set
Top 17%
0.7%