Genes — Latest Matching Preprints

1

Expression-dependent but strand-independent synonymous single-nucleotide polymorphism in the Escherichia coli chromosome

Deka, N.; Beura, P. K.; Sen, P.; Aziz, R.; Kashyap, A.; Keot, D.; Jain, M.; Namsa, N. D.; Deka, R. C.; Feil, E.; Satapathy, S. S.; Ray, S. K.

2026-05-26 evolutionary biology 10.64898/2026.05.22.727198 medRxiv

Top 0.1%

10.5%

Show abstract

BackgroundMutation is thought to arise mainly during replication, though transcription is also known to be mutagenic. Considering the recent reports regarding genome-wide transcription-induced mutagenesis, a distinct demonstration of specific mutation being replication-dependent and/or transcription-dependent in genomes is yet to be established. Here, we studied synonymous single-nucleotide polymorphisms (SNPs) in 2091 individual coding sequences (CDS) in the leading strand (LeS) and the lagging strand (LaS) of the Escherichia coli chromosome by comparing across 157 strains. The frequencies of complementary transitions (ti) and complementary transversions (tv) were compared in each CDS to assess parity violation in the strands. ResultsThe C[->]T and G[->]A exhibited the maximum frequency as well as the most prominent strand inequality as these tis were influenced both by the strands as well as by the expression. Interestingly, inequality between T[->]C and A[->]G was expression-dependent but strand-independent. A[->]T and G[->]T tvs were universally more frequent than their complementary T[->]A and C[->]A tvs, respectively. ConclusionsOur study demonstrates strand-independent but expression-dependent synonymous SNP inequality in CDS, supporting the role of transcription-induced mutagenesis contributing to strand inequality in the E. coli chromosome.

2

Clarified an rDNA Gene Unit Pattern with (CTTT)n and (CT)n Microsatellites Aggregation Ahead of and Behind the Gene in Human Genome

Shen, J.; Tang, S.; Xia, Y.; Qin, J.; Xu, H.; Tan, Z.

2026-03-24 genetics 10.64898/2026.03.22.713381 medRxiv

Top 0.1%

6.9%

Show abstract

BackgroundConventional models of human ribosomal DNA (rDNA) array organization have historically depended on transcription-centric boundaries, partitioning the unit into a [~]13 kb rDNA transcription region and a monolithic [~]31 kb intergenic spacer (IGS). While our previous identification of Duplication Segment Units (DSUs) mapped these arrays based on an intuitive analysis of the microsatellite density landscape of the complete reference human genome, our present deep mining of this landscape has revealed a more accurate rDNA Gene Unit Pattern. Methods & ResultsIn this study, we conducted a deep mining analysis of our previously established microsatellite density landscape of the T2T-CHM13 assembly, focusing specifically on nucleolar organizing regions (NORs). We suggest a more accurate rDNA Gene Unit Pattern containing a (CTTT)n microsatellite aggregation ahead of the rDNA gene and a (CT)n microsatellite aggregation behind the gene, rather than a pattern featuring an IGS region inserted between two rDNA genes. ConclusionsA correct rDNA gene pattern of the human genome probably includes a (CTTT)n microsatellite aggregation ahead of the gene and a (CT)n microsatellite aggregation behind it, which possibly constitute cis- and trans-regulating regions; the (CTTT)n and (CT)n microsatellite aggregations may provide two different local stable DNA structures for regulatory protein binding.

3

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.1%

6.4%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.

4

Genome-wide identification of rhabdoviral sequences in alfalfa (Medicago sativa L.)

Grinstead, S.; Nemchinov, L. G.

2026-05-22 genomics 10.64898/2026.05.20.726541 medRxiv

Top 0.1%

6.4%

Show abstract

We recently reported the identification of endogenous viral elements (EVEs) originating from the Caulimoviridae family within the alfalfa (Medicago sativa L.) genome. Our subsequent identification of ubiquitous rhabdoviral elements in infected and healthy alfalfa tissues by high throughput sequencing prompted us to suggest that the alfalfa genome might be populated with integrated rhabdoviruses as well. Bioinformatics analysis using 26 publicly available alfalfa genomes proved the suggestion accurate. We found multiple non-retroviral segments of the Rhabdoviridae family belonging to the genera Betanucleorhabdovirus and Betacytorhabdovirus that appeared to be stable constituents of the host genome. In that capacity they could potentially acquire functional roles in alfalfas development and response to environmental stresses. We believe this study reveals the first documented case of rhabdoviruses integrated into the alfalfa genome.

5

Evolutionary history of alpha satellite DNA in Cercopithecini: comparative cytogenomics highlights the diversification pattern of primate centromere repeats

Cacheux, L.; Dutrillaux, B.; Gerbault-Seureau, M.; Nicolas, V.; Ponger, L.; Bed'Hom, B.; Escude, C.

2026-04-21 evolutionary biology 10.64898/2026.04.19.719437 medRxiv

Top 0.1%

6.3%

Show abstract

BackgroundAlpha satellites, a superfamily of AT-rich tandem repeats, are the primary DNA component of centromeres in Platyrrhini and Catarrhini. Analyses of the human genome suggest that centromeres behave like biological ridges, with new alpha satellite families expanding at the centromere core, splitting and displacing older ones towards the pericentromeres. The Cercopithecini tribe, which displays an unusual chromosomal evolution involving multiple chromosomal fissions and centromere formations, represents a promising model to enhance our understanding of alpha satellite DNA evolutionary history. We previously applied targeted sequencing to centromere DNA from two distant species drawn from the Cercopithecini terrestrial and arboreal lineages, and characterized six alpha satellite families exhibiting varying mean sequence identities. MethodsCombining classical and molecular cytogenetics, we mapped the chromosomal distribution of these alpha satellite families across 13 Cercopithecini, one Papionini, and one Colobinae species. A nuclear marker-based phylogeny provided an evolutionary framework for interpretation. ResultsOur phylogeny identifies the terrestrial and arboreal lineages, and a newly designated swamp clade. We observed significant interspecies variations in alpha satellite patterns, including differences in presence/absence and distinct chromosomal distribution patterns (centromeric, pericentromeric, or subtelomeric). Families previously described as heterogeneous (83-87% mean sequence identity) exhibit a centromeric position in the swamp lineage, which is characterized by conserved karyotypes. In contrast, these families show a pericentromeric distribution in the terrestrial and arboreal lineages, replaced at the centromere core by more homogeneous families (95-98% mean sequence identity). In the arboreal clade, which is characterized by highly fissioned karyotypes, putative evolutionary new centromeres show a unique co-occurrence of highly homogeneous and heterogeneous families. Conclusion & ImplicationsWe propose a comprehensive evolutionary scenario for alpha satellite DNA in Cercopithecini, where younger families arise at the centromere core, shift toward the pericentromeres as they age, and eventually face extinction. Our study suggests that alpha satellite DNA and chromosomes evolve in an interdependent manner, with satellite diversification and displacement occurring in parallel with chromosome fissions and centromere repositioning. This comparative cytogenomic approach provides both support for the human-based evolutionary model for alpha satellite DNA and novel temporal insights into its diversification dynamics. Beyond evolutionary genomics, our findings highlight the potential of alpha satellite DNA to complement systematic studies in deciphering complex primate evolutionary histories.

6

Recurrent LINE 1 exonization drives transcriptome remodelling in NSCLC

Parida, A. S.; Kumar, A.; Tiwari, B.

2026-04-24 genomics 10.64898/2026.04.22.720055 medRxiv

Top 0.1%

6.2%

Show abstract

The only autonomously active transposable elements in the human genome are Long interspersed nuclear element-1 (LINE-1) elements. These elements are known to play an important role in changing the transcriptome. LINE-1 sequences affect gene regulation during post-transcription processing, along with their established role in retrotransposition. Exonization is one mechanism where the LINE-1 integrated genome undergoes alternative splicing to produce new isoforms of transcripts. Our work mainly highlights the effect of LINE-1 associated exonization, focusing on the formation of isoforms of transcripts. Using Non-small cell lung cancer (NSCLC) as a model, we conducted a detailed transcriptome study that combines splice junction profiling with gene expression data. Our results show that LINE-1 sequences are often included as exons in host transcripts, leading to the formation of new exons and their various isoforms. The events are validated by solid splice junction evidence that proves the reliability and reproducibility. In particular, it was observed that repetitive analyses revealed certain LINE-1 exonization events that were consistent. The finding indicates that LINE-1 act as recurrent sources of splice ready sequences. Though exonizations do not necessarily affect the total expression levels of genes, our study reveals that they certainly contribute to transcript diversity. The diversity of isoforms generated potentially contributes to the effects of gene function. This study is limited to NSCLC, but it is likely that the exonizations events play a crucial role in the altering RNA diversity in cancers. Therefore the study elucidates new insights into how transposable elements modify gene structure and function during cancer development.

7

Snhg26 long non-coding RNA regulates pluripotent cell states through a SINEB2-derived sequence

Fort, V.; Khelifi, G.; Hussein, S. M. I.

2026-05-24 cell biology 10.64898/2026.05.21.726207 medRxiv

Top 0.1%

4.9%

Show abstract

Abstract/SummaryLong-non coding RNAs (lncRNAs) are now well established players in gene expression regulation, but their detailed molecular mechanisms of action and underlying regulating sequences remain poorly understood. An emerging concept supports the idea that repeated sequences, more notably sequences derived from transposable elements (TEs), contribute to functional domains of lncRNAs. Here, we undertake the characterization of the function, interactors and functional domains of Snhg26, a lncRNA involved in reprogramming towards induced pluripotent stem cells (iPSCs) and maintenance of the pluripotent state of embryonic stem cells (ESCs). First, we show that modulation of Snhg26 expression levels affects expression and splicing of genes involved in pluripotency and chromatin remodeling during the early steps of reprogramming and in ESCs. We also find that down-regulation of Snhg26 increases the expression of SINEB2 transposable elements. Moreover, we identify hundreds of transcripts directly interacting with Snhg26 with a significant enrichment of RNAs containing SINEB2 elements. Strikingly, loss of a SINEB2 sequence embedded within Snhg26 abolishes its function in regulating pluripotent states. Our results thus support the idea that TEs constitute a source of functional units for lncRNAs and encourages further efforts to explore this concept.

8

High prevalence of loss of Y chromosome in the spermatozoa of young cancer survivors

Axelsson, J.; Bruhn-Olszewska, B.; Sarkysian, D.; Markljung, E.; Horbacz, M.; Pla, I.; Sanchez, A.; Nenonen, H.; Elenkov, A.; Dumanski, J. P.; Giwercman, A.

2026-03-23 genetic and genomic medicine 10.64898/2026.03.20.26348822 medRxiv

Top 0.1%

4.8%

Show abstract

Cancer-related genomic instability (GI) may cause genetic alterations in spermatozoa, implying health issues not only in cancer survivors, but also in their children [1, 2]. We therefore studied Loss of Y chromosome (LOY), considered as hallmark of GI [3-15], in spermatozoa and blood from survivors of childhood and testicular cancer (CC, TC), and controls (CTRL). We found that LOY was statistically significantly more frequent in spermatozoa from cancer survivors than in controls (Odds Ratio [OR]=2.2 for CC vs. CTRL and OR=2.4 for TC vs. CTRL). Furthermore, LOY was about an order of magnitude more prevalent in spermatozoa than in blood among 18-53-year-old males within all cohorts. Our findings suggest that LOY in spermatozoa might be a clinically useful marker of GI, reduced fertility and disease predisposition in males. Introducing LOY in spermatozoa as a biomarker opens a new research avenue into disease prevention and the causes and consequences of LOY.

9

Genomic Footprints of Bottlenecks, Isolation, and Inbreeding: A Case Study of Two Vulture Cohorts in India

Shukla, M.; Bohra, D. L.; Rao, B.; Narayan, L.; Kiran, S.; Thakur, V.

2026-05-05 genomics 10.64898/2026.04.30.721611 medRxiv

Top 0.1%

4.8%

Show abstract

Genomic erosion as a manifestation of small effective population size (Ne) and consanguinity subverts long-term perpetuation of threatened species by compromising their adaptive potential; however, the integration of genomics remains limited in applied conservation efforts to guide priorities. This study combines non-invasive sampling, double-digest Restriction site-associated DNA sequencing (ddRAD), and population-genomic analyses to assess genetic health in two vulture assemblages-mixed wild enclosure and captive breeding cohorts. Both the geographical locations exhibit signs of populations in distress: low genetic diversity and abundant intermediate-length runs of homozygosity (RoH), consistent with long-term reduced Ne plus recent demographic isolation. Our demographic model runs favoured ancient migration (AM) topology characterised by an ephemeral window of gene flow, taken over by a prolonged population separation period. The mutation quantification results from approximately 59,000 outgroup-polarised SNPs reveal higher additive burden and more homozygous-derived sites in BKN. However, this was later traced to low-impact and non-coding variants rather than a surge in the loss-of-function (LoF) alleles. The data support a genomic profile that carries an elevated risk from polygenic/aggregate deleterious burden in BKN despite a scarcity of high-impact mutations. By highlighting the disconnect between genetic resilience and demographic recovery, our results accentuate the need to incorporate genomics-informed inbreeding and monitoring programs, while also focusing on reducing anthropogenic mortality with genetic augmentation.

10

A telomere-to-telomere (T2T) pig genome assembly reveals Y chromosome diversity and structural variations of Wuzhishan pigs

Ren, Y.; Wang, F.; Li, X.; Liu, G.; Sun, R.; Zheng, X.; Zhang, Y.; Lin, R.; Lu, X.; Chen, L.; Xin, W.; Fei, Y.; Chao, Z.

2026-04-27 genomics 10.64898/2026.04.23.720499 medRxiv

Top 0.1%

4.4%

Show abstract

BackgroudWuzhishan (WZS) pigs are native to Hainan Province of China, and serve as both important agricultural resources and biomedical models. Although the published WZS pig genome (T2T-pig1.0) even achieving telomere-to telomere (T2T) completeness, substantial genetic diversity still exists within the same pig breed, another WZS pig genome named WZS-T2T was assembled in this study. ResultsMultiple sequencing data were used to assemble genome, and finally yielded a [~]2.68 Gb telomere-to-telomere genome, with N50 length [~]142.87 Mb, and annotated protein coding genes of 23,100. Compared to T2T-pig1.0, QV and BUSCO value was higher, and the Y chromosome (ChrY) length was longer in WZS-T2T than that of T2T-pig1.0. ChrY of two WZS pigs shared 11 genes, including sex differentiation-related genes of SHOX, PRKX, and DDX3X, and SRY; however, energy metabolism gene SLC25A4 and the macrophage-related receptor gene CSF2RA of ChrY were specific to WZS-T2T. An inversion SV on chromosome 10 with length [~]33.86 Mb was identified between two WZS pigs, and three proofs were proposed for proving the accuracy sequence orientation of WZS-T2T.The genetic diversity was consistent with LD decay speed in population different analysis. WZS pigs exhibited higher genetic diversity than other four pig populations (Tunchang pigs, Yuxi black pigs, Large White pig, and Duroc pigs) examined in this study, and presented slower LD decay compared to other four breeds. ConclusionsTherefore, WZS-T2T provided a higher-quality assembly, and potential advantages of both agricultural production and biomedical targets for WZS pigs.

11

Accurate estimation of canine inbreeding using ultra low-coverage whole genomesequencing

Pellegrini, M.; Kim, R.; Rubbi, L.; Kislik, G.; Smith, D.

2026-04-07 bioinformatics 10.64898/2026.04.04.716453 medRxiv

Top 0.1%

4.3%

Show abstract

The measurement of inbreeding has gained significance across diverse fields, including population and conservation genetics, agricultural genetics, breeding programs for animals and plants, and wildlife management. This is due to the fact that inbreeding leads to increased homozygosity and results in lower genetic diversity, rendering populations more vulnerable to environmental changes, diseases, and other stressors. High or mid-coverage whole genome sequencing (WGS) has been widely used for inbreeding estimation, but it is resource-intensive. We aimed to investigate the use of ultra low-coverage whole genome sequencing (ulcWGS) as a cost-effective alternative for inbreeding analysis. Domestic dogs were used for our study as their extensive breeding histories lead to populations with a wide range of inbreeding levels. We constructed a multi-breed reference panel from high-coverage WGS samples. Inbreeding in independent ulcWGS samples was then estimated using runs of homozygosity (RoH) and inbreeding coefficients (F). We modeled the relationship between these measures and sequencing depth using nonlinear regression, to generate inbreeding estimates relative to sequencing depth. Resulting relative RoH and F measurements were significantly correlated, with purebred dogs exhibiting more runs of homozygosity and higher inbreeding coefficients compared to mixed-breed dogs. Our findings demonstrate that ulcWGS can provide reliable and economical estimations of inbreeding, expanding accessibility to genetic monitoring.

12

Gene model for the ortholog of Lst8 in Drosophila yakuba

Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.

2026-05-14 genomics 10.64898/2026.05.12.723325 medRxiv

Top 0.1%

4.3%

Show abstract

Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

13

Knob K180 Constitutive Heterochromatin Of Maize Exhibit Tissue-Specific Chromatin Senstitive Profiles Distinct From Other Types Of Heterochromatins

Sattler, M. C.; Singh, A.; Bass, H. W.; Mondin, M.

2026-04-04 genetics 10.64898/2026.04.01.715864 medRxiv

Top 0.1%

4.2%

Show abstract

BackgroundMaize knobs are regions of constitutive heterochromatin that are readily identified in both meiotic and somatic chromosomes. These structures have been characterized as stable throughout the cell cycle, exhibiting late replication during the S-phase, and are composed of two specific families of highly repetitive DNA sequences: K180 and TR-1. Although widely used as cytogenetic markers due to their variability in number and chromosomal position across inbred lines, hybrids, and landraces, little is known about their chromatin structure and dynamics. In this study, we analyzed chromatin accessibility of knobs using DNS-seq data across four maize tissues representing distinct developmental stages. ResultsOur results reveal that K180 knobs exhibit tissue-specific variation in chromatin accessibility, transitioning between open and closed states during development. In contrast, the TR-1 knob of chromosome 4 remained consistently inaccessible across all tissues analyzed. A knob composed of both K180, and TR-1 further supported this observation, with only the K180 region showing dynamic accessibility. To validate these findings, we also analyzed other repetitive regions such as centromeres, which showed a uniformly closed chromatin structure similar to TR-1. These results suggest a unique developmental modulation of chromatin accessibility associated with K180 repeats. While the chromatin accessibility of knobs does not reach the levels observed at Transcription Start Sites (TSS), the comparison among different classes of repetitive DNA within maize constitutive heterochromatin provides compelling evidence for sequence-specific and tissue-specific chromatin dynamics. ConclusionsOur findings uncover a previously unrecognized property of maize knobs and establish a reference for future studies on chromatin organization and epigenetic regulation of repetitive DNA in plant genomes.

14

Genomic indicators of gene function: A systematic assessment of the human genome

Cooper, H. B.; Rojas Lopez, K. E.; Schiavinato, D.; Black, M. A.; Gardner, P. P.

2026-04-09 genomics 10.64898/2026.04.08.717348 medRxiv

Top 0.1%

4.2%

Show abstract

Proteins and non-coding RNAs are functional products of the genome that are central for crucial cellular processes. With recent technological advances, researchers can sequence genomes in the thousands and probe numerous genomic activities of many species and conditions. Such studies have identified thousands of potential proteins, RNAs and associated activities. However there are conflicting interpretations of the results and therefore which regions of the genome are "functional". Here we investigate the relative strengths of associations between coding and non-coding gene functionality and genomic features, by comparing reliably annotated functional genes to non-genic regions of the genome. We find that the strongest and most consistent association between functional genes and genomic features are transcriptional activity and evolutionary conservation. We also evaluated sequence-based statistics, genomic repeats, epigenetic and population variation data. Other features strongly associated with function include histone marks, chromatin accessibility, genomic copy-number, and sequence alignment statistics such as coding potential and covariation. We also identify potential issues with SNP annotations in short non-coding RNAs, as some highly conserved ncRNAs have significantly higher than expected SNP densities. Our results demonstrate the importance of evolutionary conservation and transcription activity for indicating protein-coding and non-coding gene function. Both should be taken into consideration when differentiating between functional sequences and biological or experimental noise.

15

Comparative analysis of transposable elements in jellyfish and hydroid species (Cnidaria: Medusozoa)

Mays, A.; Cabrera, F.; Macias-Munoz, A.

2026-04-21 evolutionary biology 10.64898/2026.04.17.719288 medRxiv

Top 0.2%

4.0%

Show abstract

BackgroundTransposable elements (TEs) are repetitive genetic elements that can jump to new loci causing genome expansions, structural rearrangements, and can, ultimately, propel the evolution of genomes. Despite their significance, the role of TEs in the evolution of genomes and phylogenetic groups remains largely understudied in early diverging lineages. Further, the extent to which TE content varies across species is still an open question. Medusozoa, a group within Cnidaria encompassing jellyfish and hydroids, exhibits an exceptional diversity of life history strategies, body plans, and physiological capabilities. These characteristics, along with its early-diverging phylogenetic position, establish Medusozoa as an ideal system for investigating the composition and evolutionary history of TEs within the group. ResultsWe generated a custom repeat library built from annotations of 25 Medusozoan genomes and used it to characterize TEs, aiming to identify lineage-specific TE content and activity that may correlate with the diversity observed within the group. We found that repetitive element percentage and genome size varied considerably, with Hydrozoa exhibiting the most variation among classes in both respects. DNA transposons were the most prevalent TE classification in all but two genomes, averaging 28% of all genomes. Intra-genus comparisons revealed a surprising degree of differences in TE content. In the genus Aurelia, the expansion of a single DNA transposon superfamily accounted for much of the difference in repetitive element percentage between two species, whereas in the genus Turritopsis, a similar divergence resulted from the proliferation of multiple superfamilies. Interestingly, most genomes showed evidence of recent TE expansions, suggesting ongoing activity in many medusozoan species. ConclusionWe present the first comparative analysis of TEs across all medusozoan classes. Our results reveal class-specific TE dynamics and highlight cases of TE proliferations as lineages diverge. This research provides data on TE activity and diversity that can be used as a resource for future study and fills important gaps in our understanding of TEs in early diverging animal lineages.

16

Integrative Identification and Characterization of PCOS-Associated lncRNAs From the Interface of Genetic Association, Transcriptomics, and Gene Structure Evolution

He, Z.; Li, Y.; Shkurat, T. P.; Butenko, E. V.; Derevyanchuk, E. G.; Lomteva, S. V.; Chen, L.; Lipovich, L.

2026-04-02 genomics 10.64898/2026.03.31.715548 medRxiv

Top 0.2%

4.0%

Show abstract

BackgroundPolycystic ovary syndrome (PCOS) is a prevalent endocrine disorder and a leading cause of female infertility, with complex genetic, metabolic, and hormonal etiologies. Long non-coding RNAs (lncRNAs) have emerged as important regulators of diverse biological processes, yet their roles in PCOS remain underexplored. Here, we identified and characterized PCOS differentially expressed gene-associated lncRNAs (PDEGAL) with an integrative approach combining expression data, genetic association, and evolutionary analysis. MethodsThirty-three PCOS-associated protein-coding genes were obtained from our prior study, and all their nearby and overlapping lncRNAs were annotated. These candidates were analyzed using UCSC Genome Browser-mapped annotations and datasets, including NCBI RefSeq, GENCODE, GTEx, GWAS SNPs, and conservation, as well as the FANTOM5 cap analysis of gene expression (CAGE) promoter data, to assess their expression, regulatory potential, genetic variant overlaps, and evolutionary conservation. ResultsTwenty-three PDEGALs (18 antisense to, and 5 sharing bidirectional promoters with, known PCOS-associated protein-coding genes) were identified. 17 PDEGALs contained GWAS SNPs with statistically significant disease associations, 9 of which were associated with PCOS-related traits. 5 PDEGALs demonstrated expression in the KGN granulosa cell model of PCOS. Key gene structure element (KGSE) analysis revealed that most PDEGALs are primate-specific. Integrating four criteria--GTEx expression, GWAS SNPs, FANTOM promoterome, and KGSE conservation--highlighted HELLPAR as the only lncRNA fulfilling all four, while five others--PGR-AS1, MTOR-AS1, ENSG00000265179, ENSG00000256218, and LOC105377276--fulfilled three of the four criteria. ConclusionsWe have systematically identified candidate PCOS regulatory lncRNAs with convergent genetic, expression, and evolutionary evidence. These results provide a framework for functional validation and highlight lncRNAs as potential biomarkers and therapeutic targets in PCOS that function by regulating their nearby and overlapping protein-coding genes.

17

Southern Iberia as a hotspot of wild grapevine genetic diversity

Rodriguez Felizzola, J. J.; Soriano Bermudez, J. J.; Blanco Pastor, J. L.

2026-04-16 evolutionary biology 10.64898/2026.04.14.718376 medRxiv

Top 0.2%

3.7%

Show abstract

AimThe commercial interest of grapevines (Vitis vinifera L.) has prompted numerous studies on their origin and genetic resources in the context of global change. However, genomic-scale information on diversity patterns and genetic structure in southwestern Europe remains scarce. This study infers the genetic structure, gene flow events between genetic groups, and genetic refugia of Vitis vinifera ssp. sylvestris in the Iberian Peninsula. LocationThe Iberian Peninsula. TaxonThe wild grapevine, Vitis vinifera L. ssp. sylvestris MethodsWe reanalyzed a set of 137 complete genomes of V. vinifera ssp. sylvestris. After variant calling, validation and annotation, we obtained a high-quality SNP dataset. Using these markers, we performed phylogenetic and population structure analyses to determine the number and spatial distribution of genetic groups and their contact zones. Next, we inferred the timing and directionality of gene flow events between groups. Finally, heterozygosity and allele rarity were estimated to identify populations with high conservation value. ResultsWe detected three major ancestral populations and four putative genetic refugia in the south of the Iberian Peninsula. Demographic analyses indicate sustained gene flow between [~]21,000 and [~]7,000 years ago from a North African ancestral group into Iberian wild populations in the south. Heterozygosity and allele rarity analyses identified populations of high conservation value in a variety of areas within the Iberian Peninsula. Main ConclusionsWe identify the biogeographical factors behind the long-known singularity of wild Iberian grapevines. The southern Iberian Peninsula is a hotspot of genetic diversity for wild grapevines, hosting three ancestral populations and multiple contact zones that acted as micro-refugia. The current genetic variability of Iberian wild grapevines is best explained by natural, climate-driven gene flow between African lineages with Middle Eastern origin and Iberian groups. These contacts were favored by climatic conditions during the late Pleistocene ([~]21,000 years) and early Holocene ([~]8,300 years). Our results dismiss a significant anthropogenic influence during Neolithic domestication for explaining the genetic composition of Iberian wild grapevine genotypes.

18

Genetic and heat-stress related environmental influences on pig whole-blood gene expression levels

Durante, A.; Feve, K.; Naylies, C.; Labrune, Y.; Gress, L.; Lippi, Y.; Legoueix, S.; Milan, D.; Gourdine, J.-L.; Gilbert, H.; Renaudeau, D.; Riquet, J.; Devailly, G.

2026-03-18 genomics 10.64898/2026.03.17.712411 medRxiv

Top 0.2%

3.7%

Show abstract

BackgroundGene expression levels are affected by genetics and environmental effects. However, quantification of the influence of genetics and environmental effects on gene expression remains limited, especially in farm animals. Here, the relative influence of genetic and heat-related environmental variations on gene expression levels was investigated in pigs, using a backcross herd of diverse heat adaptation levels. Backcross animals were raised in either a tropical or temperate environment. Animals raised in temperate environment were subjected to an experimental heat stress at the end of their growth. ResultsWe identified 1,967 differentially expressed genes (DEGs) between pigs raised in the tropical (n = 181) and temperate (n = 180) facilities, and 472 DEGs throughout a 3 weeks experimental heat stress. Transcriptome-wide association (TWAS) study identified 139 associations between gene expression levels and thermoregulation/production traits. We detected 6,014 expression quantitative trait loci (eQTLs) associated with the expression level of 3,297 genes. Genetic variance was estimated to explain 36.3% of gene expression variance on average, and was the main source of variance for 27.7% of transcripts. Most eQTLs found are located in proximal regions (cis-eQTLs) and few within distal regions (trans-eQTLs) to their assigned genes. A trans-eQTL hotspot highlighted a hematopoietic mechanism driven by GPATCH8. An integration of GWAS and TWAS pointed to TMCO1 and ZNF184 as candidate genes for backfat thickness. ConclusionsThis study provides a better understanding of the impact of climate, heat stress and genetic influences on the pig whole blood transcriptome.

19

Evolutionary persistence of a highly prevalent multicopy mitochondrial-derived nuclear insertion (Mega-NUMT) in Neotropical Drosophila flies

Montoliu-Nerin, M.; Strunov, A.; Heyworth, E.; Schneider, D. I.; Thoma, J.; Hua-Van, A.; Courret, C.; Klasson, L. J.; Miller, W. J.

2026-04-01 evolutionary biology 10.64898/2026.03.31.715258 medRxiv

Top 0.2%

3.7%

Show abstract

BackgroundAlthough strict maternal transmission of mitochondria is a general feature of animals and humans for ensuring homogeneity in mitochondrial DNA (mtDNA) across generations, exceptions were reported in the recent past. For example, some extremely rare but spectacular cases of heteroplasmy and paternal transmission in humans have questioned the universal evolutionary principle. Hence, as an alternative, the Mega-NUMT concept was coined to explain this discovery and was thereafter partly proven to exist. This concept expands on the quite common transfer of mtDNA fragments to the nucleus (NUMTs) by considering the existence of multicopy mitochondrial nuclear insertions. Mega-NUMT reports are currently restricted to a few cases in animals, including humans. However, even in humans, their detailed genomic organization, natural prevalence, and potential biological functions remain unclear. Methodology/Principal FindingsHere, we discovered that up to 60 full-sized mitochondrial genomes are integrated into the nuclear genome of the neotropical fruit fly Drosophila paulistorum using long-read sequencing and confirmed their presence by in situ hybridization. The copies are organized in one cluster on chromosome 3, which we, due to its similarity with the Mega-NUMT concept, designated the "Dpau Mega-NUMT". Contrary to the rarity in humans, this Mega-NUMT is found at high prevalence (40%) in both long-term laboratory lines and natural D. paulistorum populations of different semispecies. Additionally, the mitochondrial copies in the Mega-NUMT cluster are phylogenetically separated from the current mitotypes of D. paulistorum. Together, these observations suggest long-term maintenance of the Mega-NUMT in nature. Hence, we propose that the Dpau Mega-NUMT may have been transferred to the nuclear genome before D. paulistorum semispecies radiation and maintained at relatively high prevalence in nature by balancing selection due to yet undetermined functions. Conclusions/SignificanceTo our knowledge, this is the first verified existence and detailed dissection of a Mega-NUMT outside cats and humans. We show that Mega-NUMTs can be persistent in nature, even at high prevalence, potentially due to balancing selection. Our findings strengthen the importance of high-quality long-read sequencing technologies for deciphering complex repeat-rich genomic regions to deepen our understanding of the dynamics of genome evolution within genomic "dark matter".

20

Multistage Machine Learning Reveals Circadian Gene Programs and Supports a Retina-Choroid Axis in Myopia Development

Watcharapalakorn, A.; Poyomtip, T.; Tawonkasiwattanakun, P.; Dewi, P. K. K.; Thomrongsuwannakij, T.; Mahawan, T.

2026-04-06 bioinformatics 10.64898/2026.04.02.716020 medRxiv

Top 0.2%

3.6%

Show abstract

PurposeTo determine whether circadian timing defines critical molecular windows in myopia development and to assess the transferability of circadian gene programs across ocular tissues, disease stages, and species. MethodsPublicly available retinal and choroidal RNA-seq datasets from chick models of form-deprivation myopia were analyzed using unsupervised transcriptomic profiling and multistage machine-learning classification. Circadian windows were defined based on Zeitgeber time, and samples were grouped accordingly for downstream analyses. Classification model robustness was evaluated through cross-tissue and cross-stage validation and further assessed using external validation in an independent dataset. Functional translation to humans was examined using ortholog-based Gene Ontology enrichment analysis to identify conserved biological processes and higher-order regulatory pathways. ResultsA circadian critical window at ZT8-ZT12 exhibited the strongest transcriptional divergence during both myopia onset and progression. Gene signatures derived from this window generalized across retina and choroid and remained predictive across disease stages, supporting coordinated molecular regulation between ocular tissues. External validation confirmed the reproducibility of these signatures despite differences in experimental design and gene coverage. Functional mapping revealed that conserved molecular components in chicks are reorganized into more complex neuroendocrine and regulatory networks in humans, indicating cross-species conservation with increased functional complexity. ConclusionsCircadian timing strongly shapes myopia-related gene expression and underlies coordinated retina-choroid signaling. These findings highlight circadian biology as a key factor of refractive development and suggest that time-dependent mechanisms may influence myopia susceptibility, progression, and response to treatment.