Back

Effect of a LINE1 DNA sequence on expression of long human genes

Brown, J. C.

2024-07-09 genomics
10.1101/2023.11.21.568109 bioRxiv
Show abstract

The study described here was carried out to pursue the idea that a truncated, transposition incompetent fragment of a LINE1 retrotransposon may affect the expression of a human gene when it is located inside the gene sequence. NCBI BLAST was used to probe the human genome to identify protein coding genes containing an abundant [~]1500bp LINE1 fragment (called t1519) in the gene body. The length and expression level of such genes was then compared with the same properties in genes that lack t1519 in human chromosomes 16-18. The results showed a striking effect of t1519 on long genes, those with lengths greater than [~]140 kb. Nearly all were found to have one or more t1519 sequences in the coding region. In contrast, genes in the common length range (less than 140 kb) could either have t1519 or not. A correlation was also observed with the level of gene expression. While expression of long, t1519-containing genes was limited to [~]50 TPM, genes in the common length range could be much higher, in the range of 500-600 TPM, regardless of whether or not they have t1519 elements. Contrasting results were obtained when the analysis was performed with lncRNAs rather than with protein-coding genes. Among lncRNA genes a chromosome-specific effect was observed. Restricted expression correlating with the presence of t1519 was observed in both long and common length genes of chromosomes 16 and 17, but not in chromosome 18. The results are interpreted to support a strong suppressive effect of t1519 on expression of long protein coding genes and on both long and common length lncRNA genes of chromosomes 16 and 17. It is suggested that the suppressive effect on expression, particularly among long genes, meets a need for the cell to limit the overall level of transcription it can support. Author summaryAlthough LINE1 DNA sequence elements are well known for their ability to replicate and move autonomously within the human genome, these features are observed in only a small proportion (0.02%) of the total human LINE1 population. Nearly all of the total [~]500,000 LINE1 elements are fragments of full-length LINE1 and are inactive for autonomous replication or movement. Truncated, inactive LINE1 sequences are found throughout the human genome including within the body of protein-coding genes, and this intragenic population is the subject of the study described here. The goal was to extend what is known about the properties of intragenic LINE1 sequences. The study was carried out with t1519, a truncated LINE1 sequence composed of the 3 terminal [~]1500 bp of the [~]6000 bp full length LINE1 element, and with the sequences of three human chromosomes 16, 17 and 18, that are rich in t1519 sequences. NCBI BLAST was used to identify t1519-containing genes in each chromosome, and the length and expression level of those genes was compared with control genes lacking t1519. A striking result was observed in the case of long protein-coding genes, genes longer than 140 kb. Nearly all had one or more t1519 sequences in the gene body, all in introns. An effect on the level of gene expression was also observed. Low expression (<50 TPM) was found in all long, t1519 positive genes while much higher levels (500-600 TPM) were found with genes in the common length range (< 140 kb) regardless of the presence of t1519. Similar results were obtained when lncRNA genes were studied instead of protein-coding ones. The results are interpreted to support a strong suppressive effect of t1519 on expression of long protein coding genes and also on certain lncRNA genes. It is suggested that the suppressive effect is due to a need for the cell to limit the overall level of transcription it can support.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.1%
14.5%
2
Genes
126 papers in training set
Top 0.1%
12.2%
3
Gene
41 papers in training set
Top 0.1%
10.0%
4
International Journal of Molecular Sciences
453 papers in training set
Top 0.5%
6.7%
5
PLOS ONE
4510 papers in training set
Top 28%
6.3%
6
F1000Research
79 papers in training set
Top 0.2%
6.3%
50% of probability mass above
7
Scientific Reports
3102 papers in training set
Top 25%
4.8%
8
Gene Reports
13 papers in training set
Top 0.1%
3.9%
9
Molecular Biology Reports
19 papers in training set
Top 0.1%
2.3%
10
Molecular Genetics and Genomics
11 papers in training set
Top 0.1%
1.9%
11
PeerJ
261 papers in training set
Top 6%
1.8%
12
BMC Genomics
328 papers in training set
Top 3%
1.6%
13
Oncotarget
15 papers in training set
Top 0.2%
1.3%
14
Heliyon
146 papers in training set
Top 3%
1.3%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
16
Biosystems
18 papers in training set
Top 0.3%
1.2%
17
Biology
43 papers in training set
Top 2%
0.9%
18
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 7%
0.9%
19
Epigenetics
43 papers in training set
Top 0.8%
0.9%
20
Journal of Molecular Evolution
21 papers in training set
Top 0.3%
0.8%
21
BMC Genomic Data
12 papers in training set
Top 0.2%
0.7%
22
BMC Cancer
52 papers in training set
Top 3%
0.7%
23
Genomics
60 papers in training set
Top 3%
0.7%
24
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%
25
Journal of Personalized Medicine
28 papers in training set
Top 2%
0.6%