Back

Long-read sequencing reveals transposable element-derived chimeric transcripts at zygotic genome activation in mammalian embryos

Kawakami, S.; Kitao, K.; Ikeda, S.; Honda, S.

2026-05-28 developmental biology
10.64898/2026.05.25.727629 bioRxiv
Show abstract

BackgroundTransposable elements (TEs) are mobile genomic sequences that constitute one-third to one-half of the mammalian genome. Recently, TEs have been recognized for their important roles as cis-regulatory elements. TEs are broadly activated during zygotic genome activation (ZGA) in mammalian embryos, where they function as alternative promoters of host genes and drive the transcription of chimeric transcripts. However, the construction of comprehensive chimeric transcript databases based on short-read sequencing remains limited due to the repetitive and abundant nature of TEs in the genome. Here, we used long-read RNA sequencing to construct a comprehensive dataset of chimeric transcripts expressed in ZGA mouse and bovine embryos. ResultsWe identified 11,996 and 4,755 chimeric transcripts variants derived from 2,695 and 1,200 host genes in mouse and bovine, respectively, exceeding the numbers reported in previous short-read-based studies. Among them, 114 orthologous pairs produced chimeric transcripts in both species. Gene Ontology analysis revealed significant enrichment of terms related to transcriptional regulation and protein modification in mouse, whereas no terms were significantly enriched in bovine. Assessment of the protein-coding potential of the TE-driven transcripts using predicted open reading frames (ORFs) revealed that the proportion of "Protein-coding" transcripts was lower, whereas that of "LncRNA" (long non-coding RNA) was higher compared with all transcripts in both species. Among the ORFs classified as "Protein-coding", comparison with canonical ORFs revealed a tendency for the N terminus to be truncated while the C terminus remained intact in both species. TE-derived promoters used in mouse were enriched for mouse-specific TEs, whereas those in bovine were enriched for older TEs conserved among eutherians. In addition, long-read sequencing detected a greater number and proportion of TEs used as promoters in mouse and bovine than short-read sequencing. Although motif analysis identified KLF5 and OTX2 binding sites upstream of TE-derived promoters in both species, the specific TEs containing these motifs differed between the two species. ConclusionsThis study presents the first long-read sequencing analysis of chimeric transcripts in mammalian embryos in two species. Our approach revealed the functional similarities of chimeric transcripts between species, as well as species-specific differences in their TE compositions.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Mobile DNA
27 papers in training set
Top 0.1%
25.7%
2
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 0.1%
17.4%
3
Scientific Reports
3102 papers in training set
Top 19%
6.3%
4
PLOS ONE
4510 papers in training set
Top 31%
4.8%
50% of probability mass above
5
Gene
41 papers in training set
Top 0.3%
3.6%
6
BMC Genomics
328 papers in training set
Top 1%
2.6%
7
Genes
126 papers in training set
Top 0.5%
2.6%
8
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 3%
2.3%
9
Genome Biology
555 papers in training set
Top 4%
2.1%
10
Frontiers in Genetics
197 papers in training set
Top 4%
2.1%
11
BMC Biology
248 papers in training set
Top 1%
1.7%
12
Developmental Dynamics
50 papers in training set
Top 0.4%
1.7%
13
Biology Open
130 papers in training set
Top 1%
1.7%
14
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
15
Plant Physiology
217 papers in training set
Top 2%
1.3%
16
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
17
Development, Growth & Differentiation
12 papers in training set
Top 0.1%
1.3%
18
Open Biology
95 papers in training set
Top 1%
1.2%
19
Genome Biology and Evolution
280 papers in training set
Top 1%
0.9%
20
BMC Research Notes
29 papers in training set
Top 0.4%
0.9%
21
Genomics
60 papers in training set
Top 2%
0.9%
22
PLOS Genetics
756 papers in training set
Top 14%
0.8%
23
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.7%
24
The CRISPR Journal
33 papers in training set
Top 0.3%
0.7%
25
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.7%
26
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%
27
PeerJ
261 papers in training set
Top 17%
0.6%