Genes — Latest Matching Preprints

1

Origin of a novel CYP20A1 transcript isoform through multiple Alu exaptations creates a potential miRNA sponge

Bhattacharya, A.; Jha, V.; Singhal, K.; Fatima, M.; Singh, D.; Chaturvedi, G.; Dholakia, D.; Kutum, R.; Pandey, R.; Bakken, T. E.; Seth, P.; Pillai, B.; Mukerji, M.

2020-02-22 evolutionary biology 10.1101/618645 medRxiv

Top 0.1%

37.1%

Show abstract

BackgroundPrimate-specific Alus contribute to transcriptional novelties in conserved gene regulatory networks. Alu RNAs are present at elevated levels in stress conditions and consequently leads to transcript isoform specific functional role modulating the physiological outcome. One of the possible mechanisms could be Alu nucleated mRNA-miRNA interplay. ResultUsing combination of bioinformatics and experiments, we report a transcript isoform of an orphan gene, CYP20A1 (CYP20A1_Alu-LT) through exaptation of 23 Alus in its 9kb 3UTR. CYP20A1_Alu-LT, confirmed by 3RACE, is an outlier in length and expressed in multiple cell lines. We demonstrate its presence in single nucleus RNA-seq of [~]16000 human cortical neurons (including rosehip neurons). Its expression is restricted to the higher primates. Most strikingly, miRanda predicts [~]4700 miRNA recognition elements (MREs; with threshold< -25kcal/mol) for [~]1000 miRNAs, which have majorly originated within the 3UTR-Alus post exaptation. We hypothesized that differential expression of this transcript could modulate mRNA-miRNA networks and tested it in primary human neurons where CYP20A1_Alu-LT is downregulated during heat shock response and upregulated upon HIV1-Tat treatment. CYP20A1_Alu-LT could possibly function as a miRNA sponge as it exhibits features of a sponge RNA such as cytosolic localization and [≥]10 MREs for 140 miRNAs. Small RNA-seq revealed expression of nine miRNAs that can potentially be sponged by CYP20A1_Alu-LT in neurons. Additionally, CYP20A1_Alu-LT expression was positively correlated (low in heat shock and high in Tat) with 380 differentially expressed genes that contain cognate MREs for these nine miRNAs. This set is enriched in genes involved in neuronal development and hemostasis pathways. ConclusionWe demonstrate a potential role for CYP20A1_Alu-LT as miRNA sponge through preferential presence of MREs within Alus in a transcript isoform specific manner. This highlights a novel component of Alu-miRNA mediated transcriptional modulation leading to physiological homeostasis.

2

Primate deep conserved noncoding sequences and non-coding RNA: their possible relatedness to brain and Central Nervous System

Hettiarachchi, N.

2021-08-17 evolutionary biology 10.1101/2021.08.17.456625 medRxiv

Top 0.1%

32.1%

Show abstract

BackgroundConserved non coding Sequences (CNSs) are extensively studied for their regulatory properties and functional importance to organisms. Many features such as location, proximity to the likely target gene, lineage specificity, functionality of likely target genes, and nucleotide composition of these sequences have been investigated, thus have provided very meaningful insight to signify underlying evolutionary importance of these elements. Also thorough investigation around how to assign function to non-coding regions of eukaryote genomes is another area that is studied. On one hand evolutionary analyses, including signatures of selection or conservation which can indicate the presence of constraint, suggesting that sequences that are evolving non-neutrally are candidates for functionality. On the other hand evidence that is based on experimental profiling of transcription, methylation, histone modifications and chromatin state. While these types of data are very important and are associated with function in most cases, this is not always the case. Evolutionary conservation though highly conservative which mostly considers elements identifiable in more than one species, is still being used as the initial guideline in investigating function via experiments. If we had an understanding of the experimental profiles of conserved non-coding regions as there may be patterns that are often associated these potentially functional elements it may help to construed functionality of conserved non coding regions easily. ResultsIn an effort to try integrate experimental profile data, we investigated evidence of expression of conserved noncoding sequences (CNSs). For CNSs from ten primates, we assessed transcription, histone modifications, level of evolutionary constraint or accelerated evolution, and assessed possible target genes, tissue expression profiles of likely target genes (as some CNSs may be enhancers, and may be ncRNAs that interact directly with mRNA) and clustering patterns of CNSs. In total we found 153475 CNSs conserved across all ten primates. Of these 59,870 were overlapping non coding regions of ncRNA genes. H3K4Me1 marks (often associated with active enhancers) were highly correlated with CNSs whereas H4K20Me1 (linked to, e.g. DNA damage repair) had high correlation with conserved ncRNA regions (ncRNA-gene-CEs). Both CNSs and conserved ncRNA showed evidence of being under purifying selection. The CNSs in our dataset overall exhibited lower allele frequencies, consistent with higher levels of evolutionary constraint. We also found that CNSs and ncRNA-gene-CEs produce mutually exclusive groups. The analyses also suggest that both types of conserved elements have undergone waves of accelerated evolution, which we speculate may indicate changes in regulatory requirements following divergence events. Finally, we find that likely target genes for hominoidae, primate and mammalian-specific CNSs and ncRNA-gene-CEs are predominantly associated with brain-related function in humans. ConclusionThe deep conserved primate CNSs and ncRNA gene-CEs signify functional importance suggesting ongoing recruitment of these elements into brain-related functions, consistent with King and Wilsons hypothesis that regulatory changes may account for rapid changes in phenotype among primates.

3

G-quadruplex-forming sequences as potential drivers of genetic diversity in primate protein coding genes

Jara-Espejo, M.; Line, S. R.

2020-08-28 evolutionary biology 10.1101/2020.08.28.272971 medRxiv

Top 0.1%

21.7%

Show abstract

While non-coding G-quadruplexes (G4s) act as conserved regulatory elements when located in gene promoter and splice sites, the G4 evolutionary conservation in protein coding regions have been low explored. To address the evolutionary dynamics acting on coding G4, we mapped and characterized potential G4-forming sequences across twenty-four primates gene orthologous. We found that potentially more stable G4 motifs exist in coding regions following a species-specific trend. Moreover, these motifs depicted the least conserved sites across primates at both the DNA and amino acid levels and are characterized by an indel-rich mutational pattern. This trend was not observed for less stable G4 motifs. A deeper analysis revealed that [G>=3N1]4 motifs, depicting potentially most stable G4s, were associated with the lowest conservation and highest indel frequencies. This mutational pattern was more evident when G4-associated amino acid regions were analyzed. We discuss the possibility of an overall conservation of less/moderate stability G4, while more stable G4 may be preserved or arises in a species-specific manner, which may explain their low conservation. Since structure-prone motifs, including G4, have the potential to induce genomic instability, this evolutionary trend may contribute to avoid broad deleterious effects driven by stable G4 on protein function while promoting genetic diversity across close-related species.

4

Dynamic evolution of the major transcription factor DNA binding domain, and protein-protein interaction families during the evolution of the avian lineage

Graham, A. M.; Presnell, J.

2019-07-25 evolutionary biology 10.1101/193896 medRxiv

Top 0.1%

18.4%

Show abstract

Transcription factors are characterized by their domain architecture, including DNA binding and protein-protein interaction domain combinations, which regulate their binding specificity, as well as their ability to effect a change on gene expression of their downstream targets. Transcription factors are central to organismal development, thus they potentially are instrumental in producing phenotypic diversity. Transcription factor abundance was estimated via 49 major DNA binding domain families, as well as 34 protein-protein interaction domain families, in 48 bird genomes, which were then compared with 6 available reptile genomes, in an effort to assess the degree to which these domains are potentially connected to increased phenotypic diversity in the avian lineage. We hypothesized that there would be increased abundance in multiple transcription factor domain families, as well as domains associated with protein-protein interactions, that would correlate with the increased phenotypic diversity found in birds; instead, this data shows a general loss/contraction of major domain families, with the largest losses in domain families associated with multiple developmental (feather, body-plan, immune) and metabolic processes. Ultimately, the results of this analyses represent a general characterization of domain family composition in birds, thus the specific domain composition of TF families should be probed further, especially those with the largest reductions seen in this study.

5

Uneven growth of SARS-CoV-2 clones evidenced by more than 500,000 whole-genome sequences

Zeng, H.-L.; Liu, Y.; Thorell, K.; Norden, R.; Aurell, E.

2021-04-06 evolutionary biology 10.1101/2021.04.06.437914 medRxiv

Top 0.1%

16.9%

Show abstract

We have computed the frequencies of the alleles of the "UK variant" (B.1.1.7) and "South Africa variant" (B.1.351) of SARS-CoV-2 from the large GISAID repository. We find that the frequencies of the mutations in UK variant overall rose towards the end of 2020, as widely reported in the literature and in the general press. However, we also find that these frequencies vary in different patterns rather than in concert. For South Africa variant we find a more complex scenario with frequencies of some mutations rising and some remaining close to zero. Our results point to that what is generally reported as one variant is in fact a collection of variants with different genetic characteristics.

6

A Novel Hyper-Variable Variable Number Tandem Repeat in the Dopamine Transporter Gene (SLC6A3)

Apsley, A. T.; Domico, E. R.; Verbiest, M. A.; Brogan, C. A.; Buck, E. R.; Burich, A. J.; Cardone, K. M.; Stone, W. J.; Anisimova, M.; Vandenbergh, D. J.

2023-01-24 genomics 10.1101/2022.08.03.502653 medRxiv

Top 0.1%

15.3%

Show abstract

The dopamine transporter gene, SLC6A3, has received substantial attention in genetic association studies of various phenotypes. Although some variable number tandem repeats (VNTRs) present in SLC6A3 have been tested in genetic association studies, results have not been consistent. VNTRs in SLC6A3 that have not been examined genetically were characterized. Tandem Repeat Annotation Library (TRAL) was used to characterize the VNTRs of 64 unrelated long-read haplotype-phased SLC6A3 sequences. Sequence similarity of each repeat unit of the five VNTRs is reported, along with the correlations of SNP-SNP, SNP-VNTR and VNTR-VNTR alleles across the gene. One of these VNTRs is a novel hyper-VNTR (hyVNTR) in intron 8 of SLC6A3, which contains a range of 3.4-133.4 repeat copies and has a consensus sequence length of 38bp, with 82% G+C content. The 38-base repeat was predicted to form G-quadruplexes in silico and was confirmed by circular dichroism spectroscopy. Additionally, this hyVNTR contains multiple putative binding sites for PRDM9, which, in combination with low levels of linkage disequilibrium around the hyVNTR, suggests it might be a recombination hotspot. Summary BlurbThis VNTR has a heterozygosity value of 0.93, forms G-tetrads, and is in low linkage disequilibrium with surrounding sequence, making it a new site for genetic analysis.

7

What can Y-DNA analysis reveal about the surname Hay and the Hay noble lineage of Scotland?

Stead, P.; Haddrill, P. R.; Macdonald, A. F.

2025-07-15 genetics 10.1101/2025.07.09.664039 medRxiv

Top 0.1%

14.6%

Show abstract

The family name Hay (plus associated spelling variants) is a prominent Anglo-Norman-in-origin surname that has been well-documented as a Scottish noble lineage since the 12th century CE. Their historical significance, linked to the rise of the Anglo-Norman era (1093-1286 CE) in Scotland, and the historical complexities of surname adoption post-Norman conquest of England, justifies the need for a comprehensive understanding of the genetic history of the Hay noble lineage. This study focuses on examining the patterns of paternal inheritance in lineages with the Hay surname. We conducted a comprehensive analysis of Y-chromosome data that is publicly available on the Family Tree DNA (FTDNA) platform, and specific FTDNA surname projects, as well as looking in more detail at three well-documented male-line descendants of William II de la HAYA, 1st of Erroll, (d. 1201) that have been verified to a high degree of confidence. Our results reveal that all descendants of William II de la HAYA, 1st of Erroll, (d. 1201) derive from the multigenerational Y-SNPs R1a-YP6500 (plus equivalent SNPs BY33394 / FT2017) and R1a-FTT161. Furthermore, subclades of R1a-FTT161 have been identified that confirm direct male-line descent from two of William II de la HAYAs sons. Subclade R1a-BY199342 (plus equivalents) confirms direct male-line descent from David de la HAYA, 2nd of Erroll, (d. 1241), and subclade R1a-FTA7312 confirms direct male line decent from Robert de la HAYA of Erroll. The result also confirms that the Hay noble lineage shares the Y-SNP R1a-YP4138 (estimated to have occurred 832 CE) with several non-Hay testers that have surnames of Norman origin, therefore, providing further evidence to support the Norman origin hypothesis for these surnames. In addition to the identification of multigenerational Y-SNPs associated to documented Hay noblemen, this study has observed significant Y-DNA haplogroup diversity among males with the surname Hay (plus associated spelling variants: Hays, Haye, Hayes, Hey and Haya). Our results show that only 22% of the men sampled (n=109) with the surname Hay (plus associated spelling variation) are descended from the 12th century progenitor of the noble Hay lineage of Scotland. Therefore, confirming that a significant proportion of males with the surname Hay do not descend from the noble progenitor of the surname.

8

Relationship between Transposable Elements and behavioral traits: insights from six genetic isolates from North-Eastern Italy

Modenini, G.; Mercuri, G.; Abondio, P.; Nardone, G. G.; Santin, A.; Tesolin, P.; Spedicati, B.; Pecori, A.; Pianigiani, G.; Concas, M. P.; Girotto, G.; Gasparini, P.; Boattini, A.; Mezzavilla, M.

2025-05-09 genetic and genomic medicine 10.1101/2025.05.07.25327148 medRxiv

Top 0.1%

14.6%

Show abstract

Half of the human genome is derived from Transposable Elements (TEs), among which Alu, LINE-1 and SVA are particularly represented. Germline transposition of TEs generates polymorphisms between individuals and may be used to study association with phenotypes and inter-individual differences. Italy presents an increased number of isolated villages compared to other European groups, and these isolates provide a desirable study subject to understand the genetic variability of the Italian peninsula. Therefore, we focused on the association between polymorphic TEs, behavioral traits (tobacco use and alcohol consumption), and Body Mass Index (BMI) variations, which could lead to an increased risk of developing addiction-related or metabolic diseases. We identified 12,709 polymorphic TEs in 589 individuals from six isolates: classical population genetics analyses showed that while closely related to other European populations, the isolates tend to cluster amongst themselves and are dominated by drift-induced ancestral components. Several TEs were also significantly associated with behavioral traits (tobacco use or alcohol consumption) or with BMI variations and some of them have a functional role. These results suggest that polymorphic TEs may significantly impact inter-individual and inter-population phenotypic differentiation, while also functioning as variability markers and potentially having a role in susceptibility to medical conditions.

9

Abnormal vertebral patterns in genetically heterogeneous deceased fetuses and neonates: evidence of selection against variations

Schut, P. C.; Brosens, E.; Galis, F.; Ten Broek, C. M.; Baijens, I. M.; Dremmen, M. H.; Tibboel, D.; Schol, M. P.; De Klein, A.; Eggink, A. J.; Cohen-Overbeek, T. E.

2019-10-01 evolutionary biology 10.1101/784926 medRxiv

Top 0.1%

14.5%

Show abstract

ObjectiveTo assess the vertebral pattern in a cohort of deceased fetuses and neonates, and to study the possible impact of DNA Copy Number Variations (CNVs) in coding regions and/or disturbing enhancers on the development of the vertebral pattern.\n\nMethodRadiographs of 445 fetuses and infants, deceased between 2009 and 2015, were assessed. Terminations of pregnancies, stillbirths and neonatal deaths were included. Patients were excluded if the vertebral pattern could not be determined. Copy number profiles of 265 patients were determined using single nucleotide polymorphism array.\n\nResults274/374 patients (73.3%) had an abnormal vertebral pattern. Cervical ribs were present in 188/374 (50.3%) and were significantly more common in stillbirths (69/128 (53.9%)) and terminations of pregnancies (101/188 (53.7%)), compared to live births (18/58, 31.0%, p = 0.006). None of the rare CNVs were recurrent or overlapped candidate genes for vertebral patterning.\n\nConclusionThe presence of an abnormal vertebral pattern, particularly in the cervical region, could be a sign of disruption at critical, highly interactive and conserved stages of embryogenesis. The vertebral pattern might provide valuable information regarding fetal and neonatal outcome. CNV analyses did not identify a mutual genetic cause for the occurrence of vertebral patterning abnormalities, indicating genetic heterogeneity.

10

Evolution of a new testis-specific functional promotor within the highly conserved Map2k7 gene of the mouse

Heinen, T.; Xie, C.; Keshavarz, M.; Stappert, D.; Kuenzel, S.; Tautz, D.

2021-11-12 evolutionary biology 10.1101/2021.11.11.468196 medRxiv

Top 0.1%

13.8%

Show abstract

Map2k7 (synonym Mkk7) is a conserved regulatory kinase gene and a central component of the JNK signaling cascade with key functions during cellular differentiation. It shows complex transcription patterns and different transcript isoforms are known in the mouse (Mus musculus). We have previously identified a newly evolved testis specific transcript for the Map2k7 gene in the subspecies M. m. domesticus. Here, we identify the new promotor that drives this transcript and find that its transcript codes for an open reading frame (ORF) of 50 amino acids. The new promotor was gained in the stem lineage of closely related mouse species, but was secondarily lost in the subspecies M. m. musculus and M. m. castaneus. A single mutation can be correlated with its transcriptional activity in M. m. domesticus and cell culture assays demonstrate the capability of this mutation to drive expression. A mouse knock-out line in which the promotor region of the new transcript is deleted reveals a functional contribution of the newly evolved promotor to sperm motility and to the spermatid transcriptome. Our data show that a new functional transcript (and possibly protein) can evolve within an otherwise highly conserved gene, supporting the notion of regulatory changes contributing to the emergence of evolutionary novelties.

11

Identification and evolutionary analysis of a Triticeae tribe specific novel non-autonomous DNA transposon in DREB related Dehydration Responsive Factor1 gene.

Thiyagarajan, K.; Latini, A.; Cantale, C.; Porceddu, E.; Galeffi, P.

2022-03-06 evolutionary biology 10.1101/2022.03.05.482545 medRxiv

Top 0.1%

12.6%

Show abstract

A non-autonomous DNA transposon was identified in the DRF1 gene, belonging to the DREB gene family, the presence of this element was initially assessed in the Triticum durum DRF1 gene and subsequently it was also identified in Aegilops speltoides and Triticum urartu DRF1 genes. The DRF1 gene consists of four exons and three introns, the transposon carrying core element is inserted between the first and the third introns. Our studies identified inverted repeats, target site duplications and the presence of many internal reverse and direct short tandem and long tandem repeats, that all represent signals of a transposable element. Based on transposon specific sequence and position of the terminal inverted repeats, a possible transposition mechanism was inferred. As the identified transposable element does not possess a sequence coding for a transposase enzyme, it represents a non-autonomous element. The transposon encompasses a core element with two small, transcribed regions (Exon 2 and Exon 3) that are combined by alternative splicing during gene expression and an intron (intron2). A possible role of this non-autonomous DNA transposon in the alternative splicing regulation was investigated by a genomics approach. Divergence time analysis supported the relatively recent evolution of this transposon in Triticeae comparing to other tribes and further there is no footprints or highly disrupted footprints sequence such as TIR, TSD in other earlier evolved Poaceae member species were observed, which revealed the novelty and well-preserved nature of these signals in Triticeae. While other monocots (apart from Poaceae) and dicots, including Arabidopsis thaliana, neither showed this transposon insertion and nor revealed the existence of alternative spliced gene transcripts. In Poaceae members the core element is well preserved with disturbed transposon and transposon signals, while the tribe Triticeae especially wheat, its progenitors have intact DRF1 transposon and its signals.

12

Quantifying Structural Diversity of CNG Trinucleotide Repeats Using Diagrammatic Algorithms

Phan, E. N. H.; Mak, C. H.

2020-05-31 biophysics 10.1101/2020.05.30.124636 medRxiv

Top 0.1%

12.5%

Show abstract

Trinucleotide repeat expansion disorders (TREDs) exhibit complex mechanisms of pathogenesis, some of which have been attributed to RNA transcripts of overexpanded CNG repeats, resulting in possibly a gain-of-function. In this paper, we aim to probe the structures of these expanded transcript by analyzing the structural diversity of their conformational ensembles. We used graphs to catalog the structures of an NG-(CNG)16-CN and NG-(CNG)50-CN oligomer and grouped them into sub-ensembles based on their characters and calculated the structural diversity and thermodynamic stability for these ensembles using a previously described graph factorization scheme. Our findings show that the generally assumed structure for CNG repeats--a series of canonical helices connected by two-way junctions and capped with a hairpin loop--may not be the most thermodynamically favorable, and the ensembles are characterized by largely open and less structured conformations. Furthermore, a length-dependence is observed for the behavior of the ensembles diversity as higher-order diagrams are included, suggesting that further studies of CNG repeats are needed at the length scale of TREDs onset to properly understand their structural diversity and how this might relate to their functions. STATEMENT OF SIGNIFICANCETrinucleotide repeats are DNA satellites that are prone to mutations in the human genome. A family of diverse disorders are associated with an overexpansion of CNG repeats occurring in noncoding regions, and the RNA transcripts of the expanded regions have been implicated as the origin of toxicity. Our understanding of the structures of these expanded RNA transcripts is based on sequences that have limited lengths compared to the scale of the expanded transcripts found in patients. In this paper, we introduce a theoretical method aimed at analyzing the structure and conformational diversity of CNG repeats, which has the potential of overcoming the current length limitations in the studies of trinucleotide repeat sequences.

13

Considering founding and variable genomes is critical in studying polyploid evolution

Ye, X.; Hu, H.; Zhou, H.; Jiang, Y.; Gao, S.; Yuan, Z.; Stiller, J.; Li, C.; Chen, G.; Liu, Y.; Wei, Y.; Zheng, Y.; Liu, C.

2019-08-16 evolutionary biology 10.1101/738229 medRxiv

Top 0.1%

12.3%

Show abstract

A wide range of differences between the subgenomes, termed as subgenome asymmetry or SA, has been reported in various polyploids and different species seem to have different responses to polyploidization. We compared subgenome differences in gene ratio and relative diversity between artificial and natural genotypes of several allopolyploid species. Surprisingly, consistent differences in neither gene ratio nor relative diversity between the subgenomes were detected between these two types of polyploid genotypes although they differ in times exposed to evolutional selection. As expected, the estimated ratio of retained genes between a subgenome and its diploid donor was invariably higher for the artificial allopolyploid genotypes due likely to the presence of variable genome components (VGC). Clearly, the presence of VGC means that exaggerated differences between a donor and a subgenome in a polyploid are inevitable when random genotypes were used to represent species of either a polyploid or its donors. SA was also detected in genotypes before the completion of the polyploidization events as well as in those which were not formed via polyploidization. Considering that significant changes during and following polyploidization have been detected in previous studies, our results suggest that the influence of VGC needs to be considered in evaluating SA and that diploid donors may define changes in polyploid evolution.

14

Multi-omics profiling, in vitro and in vivo enhancer assays dissect the cis-regulatory mechanisms underlying North Carolina macular dystrophy, a retinal enhanceropathy

Van de Sompele, S.; Small, K. W.; Cicekdal, M. B.; Soriano, V. L.; D'haene, E.; Shaya, F. S.; Agemy, S.; Van der Snickt, T.; Rey, A. D.; Rosseel, T.; Van Heetvelde, M.; Vergult, S.; Balikova, I.; Bergen, A. A.; Boon, C. J. F.; De Zaeytijd, J.; Inglehearn, C. F.; Kousal, B.; Leroy, B. P.; Rivolta, C.; Vaclavik, V.; van den Ende, J.; van Schooneveld, M. J.; Gomez-Skarmeta, J. L.; Tena, J. J.; Martinez-Morales, J. R.; Liskova, P.; Vleminckx, K.; De Baere, E.

2022-07-26 genomics 10.1101/2022.03.08.481329 medRxiv

Top 0.1%

12.2%

Show abstract

North Carolina macular dystrophy (NCMD) is a rare autosomal dominant disease affecting macular development. The disease is caused by non-coding single nucleotide variants (SNVs) in two hotspot regions near PRDM13 and by duplications in two distinct chromosomal loci, overlapping DNase I hypersensitive sites near either PRDM13 or IRX1. To unravel the mechanisms by which these variants cause disease, we first established a genome-wide multi-omics retinal database, RegRet. Integration of UMI-4C profiles we generated on adult human retina then allowed fine-mapping of the interactions of the PRDM13 and IRX1 gene promoters, and the identification of eighteen candidate cis-regulatory elements (cCREs), the activity of which was investigated by luciferase and Xenopus enhancer assays. Next, luciferase assays showed that the non-coding SNVs located in the two hotspot regions of PRDM13 affect cCRE activity, including two novel NCMD-associated non-coding SNVs that we identified. Interestingly, the cCRE containing one of these SNVs was shown to interact with the PRDM13 promoter, demonstrated in vivo activity in Xenopus, and is active at the developmental stage when progenitor cells of the central retina exit mitosis, putting forward this region as a PRDM13 enhancer. Finally, mining of single-cell transcriptional data of embryonic and adult retina revealed the highest expression of PRDM13 and IRX1 when amacrine cells start to synapse with retinal ganglion cells, supporting the hypothesis that altered PRDM13 or IRX1 expression impairs interactions between these cells during retinogenesis. Overall, this study gained insight into the cis-regulatory mechanisms of NCMD and supports that this condition is a retinal enhanceropathy. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=149 SRC="FIGDIR/small/481329v2_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@4a85e3org.highwire.dtl.DTLVardef@9bfe55org.highwire.dtl.DTLVardef@156a9d8org.highwire.dtl.DTLVardef@a8cb92_HPS_FORMAT_FIGEXP M_FIG C_FIG

15

Effect of a LINE1 DNA sequence on expression of long human genes

Brown, J. C.

2024-07-09 genomics 10.1101/2023.11.21.568109 medRxiv

Top 0.1%

12.2%

Show abstract

The study described here was carried out to pursue the idea that a truncated, transposition incompetent fragment of a LINE1 retrotransposon may affect the expression of a human gene when it is located inside the gene sequence. NCBI BLAST was used to probe the human genome to identify protein coding genes containing an abundant [~]1500bp LINE1 fragment (called t1519) in the gene body. The length and expression level of such genes was then compared with the same properties in genes that lack t1519 in human chromosomes 16-18. The results showed a striking effect of t1519 on long genes, those with lengths greater than [~]140 kb. Nearly all were found to have one or more t1519 sequences in the coding region. In contrast, genes in the common length range (less than 140 kb) could either have t1519 or not. A correlation was also observed with the level of gene expression. While expression of long, t1519-containing genes was limited to [~]50 TPM, genes in the common length range could be much higher, in the range of 500-600 TPM, regardless of whether or not they have t1519 elements. Contrasting results were obtained when the analysis was performed with lncRNAs rather than with protein-coding genes. Among lncRNA genes a chromosome-specific effect was observed. Restricted expression correlating with the presence of t1519 was observed in both long and common length genes of chromosomes 16 and 17, but not in chromosome 18. The results are interpreted to support a strong suppressive effect of t1519 on expression of long protein coding genes and on both long and common length lncRNA genes of chromosomes 16 and 17. It is suggested that the suppressive effect on expression, particularly among long genes, meets a need for the cell to limit the overall level of transcription it can support. Author summaryAlthough LINE1 DNA sequence elements are well known for their ability to replicate and move autonomously within the human genome, these features are observed in only a small proportion (0.02%) of the total human LINE1 population. Nearly all of the total [~]500,000 LINE1 elements are fragments of full-length LINE1 and are inactive for autonomous replication or movement. Truncated, inactive LINE1 sequences are found throughout the human genome including within the body of protein-coding genes, and this intragenic population is the subject of the study described here. The goal was to extend what is known about the properties of intragenic LINE1 sequences. The study was carried out with t1519, a truncated LINE1 sequence composed of the 3 terminal [~]1500 bp of the [~]6000 bp full length LINE1 element, and with the sequences of three human chromosomes 16, 17 and 18, that are rich in t1519 sequences. NCBI BLAST was used to identify t1519-containing genes in each chromosome, and the length and expression level of those genes was compared with control genes lacking t1519. A striking result was observed in the case of long protein-coding genes, genes longer than 140 kb. Nearly all had one or more t1519 sequences in the gene body, all in introns. An effect on the level of gene expression was also observed. Low expression (<50 TPM) was found in all long, t1519 positive genes while much higher levels (500-600 TPM) were found with genes in the common length range (< 140 kb) regardless of the presence of t1519. Similar results were obtained when lncRNA genes were studied instead of protein-coding ones. The results are interpreted to support a strong suppressive effect of t1519 on expression of long protein coding genes and also on certain lncRNA genes. It is suggested that the suppressive effect is due to a need for the cell to limit the overall level of transcription it can support.

16

Thermodynamic stability of G-quadruplex and overlapping m6A: Position-dependent collaborators in allele frequency and fitness?

Gulec, C.

2021-04-29 genetics 10.1101/2021.04.28.441767 medRxiv

Top 0.1%

12.2%

Show abstract

BackgroundPost-transcriptional modifications like m6A, and secondary structures like G-quadruplex (G4), play an important role in RNA processing. Despite an emerging number of studies focusing on m6A and G4 separately, there are less studies about their synergy. AimSince m6A is known to be enzymatically created in DRACH-motif, and genetic variants may create a novel DRACH-motif or abolish a pre-existing DRACH-motif, we can suppose that the variants may affect gene product level through modulating m6A-G4 colocalization, which consequently may affect fitness and change allele frequency. To test this hypothesis, rare and common variants in selected human genes were investigated in terms of their effect on m6A-G4 colocalization. MethodsGenomic sequences and variant features were fetched from GRCh37/hg19 and Biomart-Ensembl databases, respectively. Counting the number of putative m6A- and G4-motifs in sequences and statistical analysis were performed with appropriate libraries of Python3.7. ResultsCommon variants creating novel m6A-motif were found more frequently inside than outside G4, and displayed unequal distribution throughout pre-mRNA. Unequal distribution of m6A-creating variants seemed to be related to their effect on thermodynamic stability of the overlapping-G4. DiscussionSelective m6A-G4 colocalization suggests that m6A-motif is favorable when overlapping with G4. Besides, thermodynamic stability may lead to unequal distribution of m6A-G4 colocalization, because m6A-creating alleles seem to have lower frequency if stabilizes overlapping-G4 in 3-prime-side, but not in 5-prime-side. We can conclude that the fitness, and consequently frequency of an m6A-creating variant is prone to become higher or lower depending on its position and effect on the overlapping-G4 stability.

17

Expression-dependent but strand-independent synonymous single-nucleotide polymorphism in the Escherichia coli chromosome

Deka, N.; Beura, P. K.; Sen, P.; Aziz, R.; Kashyap, A.; Keot, D.; Jain, M.; Namsa, N. D.; Deka, R. C.; Feil, E.; Satapathy, S. S.; Ray, S. K.

2026-05-26 evolutionary biology 10.64898/2026.05.22.727198 medRxiv

Top 0.1%

10.5%

Show abstract

BackgroundMutation is thought to arise mainly during replication, though transcription is also known to be mutagenic. Considering the recent reports regarding genome-wide transcription-induced mutagenesis, a distinct demonstration of specific mutation being replication-dependent and/or transcription-dependent in genomes is yet to be established. Here, we studied synonymous single-nucleotide polymorphisms (SNPs) in 2091 individual coding sequences (CDS) in the leading strand (LeS) and the lagging strand (LaS) of the Escherichia coli chromosome by comparing across 157 strains. The frequencies of complementary transitions (ti) and complementary transversions (tv) were compared in each CDS to assess parity violation in the strands. ResultsThe C[->]T and G[->]A exhibited the maximum frequency as well as the most prominent strand inequality as these tis were influenced both by the strands as well as by the expression. Interestingly, inequality between T[->]C and A[->]G was expression-dependent but strand-independent. A[->]T and G[->]T tvs were universally more frequent than their complementary T[->]A and C[->]A tvs, respectively. ConclusionsOur study demonstrates strand-independent but expression-dependent synonymous SNP inequality in CDS, supporting the role of transcription-induced mutagenesis contributing to strand inequality in the E. coli chromosome.

18

Unique genetic features of the naked mole-rat's THADA gene

Bullerdiek, J.; Banjar, K.; Holzmann, C.

2021-09-21 genomics 10.1101/2021.09.19.460947 medRxiv

Top 0.1%

10.5%

Show abstract

Thyroid Adenoma Associated (THADA) is a protein-coding gene that maps to chromosomal band 2p21 and first has been described as a target of recurrent translocation partner in thyroid tumors. Many genome-wide association studies have revealed an association between THADA and two frequent human diseases, i.e. type 2 diabetes and polycystic ovary syndrome. Nevertheless, the function of its protein is not been completely understood. However, recent evidence suggests that in a Drosophila model THADA can act as a sarco/endoplasmic reticulum Ca2+-ATPase (SERCA)-interacting protein which uncouples SERCA from this function. Once being uncoupled, SERCA produces an increased amount of heat without transporting calcium thus triggering nonshivering thermogenesis. This data prompted us to compare human THADA with that of 65 other eutherian mammals. This includes a comparison of THADA of a variety of eutherian mammals with that of the naked-mole rat (Heterocephalus glaber) which is known to display unique features of thermoregulation compared to other mammals. Our analysis revealed five positions where only the naked-mole rat presented differences. These latter positions included four single amino acid substitutions and one unique deletion of six or seven amino acids, respectively, between residues 858 and 859. In future studies these changes will be analyzed further in detail for their functional relevance.

19

DNA transposons of maT family in the Cnidaria

Puzakov, M. V.; Puzakova, L. V.; Cheresiz, S. V.; Shi, S.

2022-10-14 evolutionary biology 10.1101/2022.10.13.512200 medRxiv

Top 0.1%

10.3%

Show abstract

Transposable elements exert a significant influence on the structure and size of eukaryotic genomes. Representatives of Tc1/mariner superfamily of DNA transposons form a prevalent and highly variable group, which includes the relatively well studied TLE/DD34-38E, MLE/DD34D, maT/DD37D, Visitor/DD41D, Guest/DD39D, mosquito/DD37E and L18/DD37E families. A detailed study of distribution and diversity of Tc1/mariner transposons will help us to better investigate the co-evolution of TEs and eukaryotic genomes. We performed a profound analysis of maT/DD37D family in the cnidarians. maT transposons were shown to exist in a limited number of cnidarian species belonging to Cubozoa, Hydrozoa and Scyphozoa classes. maT transposons of the cnidarians are thought to be the descendants of several individual invasion events, which have occurred at different times in the past. The mosquito/DD37E transposons of the cnidarians have also been described. These TEs were shown to be present in Hydridae family (class Hydrozoa) only. An analysis of TE distribution, diversity, evolutionary history and phylogeny established that theTEs undergo their unique evolution not only in different species, but also within a particular species. These results improve our knowledge of Tc1/mariner diversity and evolution, as well as their influence of eukaryotic genomes.

20

Signatures of genetic variation in human microRNAs point to processes of positive selection related to population-specific disease risks

Villegas Miron, P.; Gallego, A.; Bertranpetit, J.; Laayouni, H.; Espinosa-Parrilla, Y.

2021-05-25 evolutionary biology 10.1101/2021.05.24.445417 medRxiv

Top 0.1%

10.3%

Show abstract

The occurrence of natural variation in human microRNAs has been the focus of numerous studies during the last twenty years. Most of them have been dedicated to study the role of specific mutations in diseases, like cancer, while a minor fraction seek to analyse the diversity profiles of microRNAs in the genomes of human populations. In the present study we analyse the latest human microRNA annotations in the light of the most updated catalog of genetic variation provided by the 1000 Genomes Project. We show by means of the in silico analysis of noncoding variation of microRNAs that the level of evolutionary constraint of these sequences is governed by the interplay of different factors, like their evolutionary age or the genomic location where they emerged. The role of mutations in the shaping of microRNA-driven regulatory interactions is emphasized with the acknowledgement that, while the whole microRNA sequence is highly conserved, the seed region shows a pattern of higher genetic diversity that appears to be caused by the dramatic frequency shifts of a fraction of human microRNAs. We highlight the participation of these microRNAs in population-specific processes by identifying that not only the seed, but also the loop, are particularly differentiated regions among human populations. The quantitative computational comparison of signatures of population differentiation showed that candidate microRNAs with the largest differences are enriched in variants implicated in gene expression levels (eQTLs), selective sweeps and pathological processes. We explore the implication of these evolutionary-driven microRNAs and their SNPs in human diseases, such as different types of cancer, and discuss their role in population-specific disease risk.