Back

A map of Non-translated RNA (nt-RNA) junctions in cancer genomes: a database resource of unproductive splicing

Huang, D.; Kwan, T.-K.; Ma, S.-L.; Tang, N. L.-s.

2025-06-19 cancer biology
10.1101/2025.06.15.659434 bioRxiv
Show abstract

BackgroundNon-translated transcripts (nt-RNAs) with frame-shifts or premature termination codons resulting from alternative splicing events (ASE), have been recently found at unexpectedly abundant in transcriptomes of cancer tissue. However, their full genomic spectrum has not yet been fully elucidated. This study comprehensively characterised the expression of signature junctions of these nt-RNA (termed "toxic junctions" here) of both known and novel nt-RNA across multiple cancer types and investigated their potential as biomarkers. MethodsRNA-seq data of [~]6,000 samples, including the tumor and normal samples for 13 cancer types were retrieved from The Cancer Genome Atlas database (TCGA) together with data from Cancer Cell Line Encyclopedia (CCLE) project. Due to the difficulty in quantifying the entire transcript isoform of nt-RNA, we pioneered an algorithm to focus exclusively on the expression of junctional reads, which also circumvented the limitation of non-directional RNA- seq of TCGA data. We showed that the majority of nt-RNA is associated with at least one toxic junction. We built a comprehensive catalogue of known nt-RNA toxic junctions from genome databases. And novel toxic junctions were also identified by a new junction-focused algorithm from the higher quality discovery subsets of TCGA data. Splicing in Ratio (SiR) was used to quantify ASE leading to nt-RNA, enabling: Differential expression analysis between cancer and normal tissue and across cancer types. Identification of different profiles of nt-RNA abundance and various factor which may be the causes of differential nt-RNA abundance and SiR results Identification of specific nt-RNA and toxic junctions that were expressed in various cancer (and/or normal tissue) types. Assessment of nt-RNA and their toxic junction expression as biomarkers or prognosis indicators. ResultsWe profiled the expressed known nt-RNA (toxic) junctions of known transcripts and discovered [~]22,000 novel toxic junctions out of [~]250,000 novel junctions found in the transcriptome data. The expression of nt-RNA was as high as 10% of all transcripts of the corresponding gene in cancer transcriptomes. Interestingly, some signature toxic junctions of nt-RNA are expressed in even higher quantities, e.g. up to 50% or more, which is reminiscent of a heterozygous mutation. We identified distinct patterns between cancer and normal samples, including example of nt-RNA expressing toxic junctions exclusively in normal or tumor samples. Clinically relevant examples included ANXA6 in breast cancer, where the nt-RNA isoform showed significantly higher expression in tumors (p=1.8e-15). In kidney renal clear cell carcinoma (KIRC), a significant isoform switch of ESYT2 based on the RNA-seq data was confirmed. The Kaplan-Meier survival curves showed that samples with the higher expression ratio of ESYT2-L are associated with better survival (p=2.0e-06). Unsupervised clustering showed that SiR results of 150 toxic signatures defined 4 subgroups of patients with different prognosis. Through principal component analysis (PCA), PC1 and PC2 can be used as an independent prognosis biomarkers. nt-RNA accounting for these PCs included splicing factors SRSF3 and CLK1, where CLK1 phosphorylates SRSF3 to promote exon 4 inclusion in both genes. ConclusionsIn summary, the expression profiles of all known and novel toxic junctions were explored using pan-cancer RNA-seq data. A dual 10% rule emerged from this study: [~]10% of novel junctions were toxic junctions associated with nt-RNA, and up to 10% of RNA transcripts inside a cell were also nt-RNA. The SiR metric enables accurate quantification of unproductive splicing and identification of cancer biomarkers. Our findings reveal that unproductive splicing represents functionally important post-transcriptional regulation in cancer. These expression profiles allow researchers to study the expression of nt-RNA signature junctions or novel signature junctions in or near the genes they are interested in, which could provide a new direction for their research. The SRSF3-CLK1 regulatory mechanism provides insights into splicing dysregulation. Our comprehensive toxic junction catalogue serves as a valuable resource, suggesting that targeting unproductive splicing pathways may offer novel therapeutic strategies for cancer treatment. Data availabilityThe catalogue is available on GitHub and UCSC browser. https://github.com/danhuang0909/nt_database for GitHub overview https://genome.ucsc.edu/s/dandan_0909/hg38_all_new_nr for genome browsing of all novel (unannotated) toxic junctions https://genome.ucsc.edu/s/dandan_0909/hg38_5_26 for toxic junctions in known (annotated) nt-RNA.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
PeerJ
261 papers in training set
Top 0.7%
6.3%
2
Genome Medicine
154 papers in training set
Top 1%
6.3%
3
npj Genomic Medicine
33 papers in training set
Top 0.1%
6.3%
4
Scientific Reports
3102 papers in training set
Top 20%
6.2%
5
PLOS ONE
4510 papers in training set
Top 32%
4.8%
6
Nucleic Acids Research
1128 papers in training set
Top 5%
3.9%
7
NAR Cancer
36 papers in training set
Top 0.1%
3.9%
8
BMC Bioinformatics
383 papers in training set
Top 2%
3.9%
9
BMC Cancer
52 papers in training set
Top 0.7%
3.5%
10
Journal of Translational Medicine
46 papers in training set
Top 0.2%
3.5%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.6%
50% of probability mass above
12
RNA Biology
70 papers in training set
Top 0.2%
2.3%
13
PLOS Computational Biology
1633 papers in training set
Top 14%
2.1%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.0%
15
Frontiers in Genetics
197 papers in training set
Top 5%
1.6%
16
Computational Biology and Chemistry
23 papers in training set
Top 0.2%
1.5%
17
Frontiers in Oncology
95 papers in training set
Top 2%
1.5%
18
iScience
1063 papers in training set
Top 18%
1.5%
19
Cancers
200 papers in training set
Top 3%
1.5%
20
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
21
Bioinformatics
1061 papers in training set
Top 8%
1.3%
22
International Journal of Cancer
42 papers in training set
Top 0.8%
1.3%
23
Mobile DNA
27 papers in training set
Top 0.1%
1.3%
24
International Journal of Molecular Sciences
453 papers in training set
Top 11%
1.2%
25
Nature Communications
4913 papers in training set
Top 57%
1.2%
26
Communications Biology
886 papers in training set
Top 16%
1.1%
27
Molecular Cancer
14 papers in training set
Top 0.7%
0.9%
28
F1000Research
79 papers in training set
Top 3%
0.9%
29
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
30
BMC Genomics
328 papers in training set
Top 5%
0.8%