Genes — Latest Matching Preprints

1

Expression-dependent but strand-independent synonymous single-nucleotide polymorphism in the Escherichia coli chromosome

Deka, N.; Beura, P. K.; Sen, P.; Aziz, R.; Kashyap, A.; Keot, D.; Jain, M.; Namsa, N. D.; Deka, R. C.; Feil, E.; Satapathy, S. S.; Ray, S. K.

2026-05-26 evolutionary biology 10.64898/2026.05.22.727198 medRxiv

Top 0.1%

10.5%

Show abstract

BackgroundMutation is thought to arise mainly during replication, though transcription is also known to be mutagenic. Considering the recent reports regarding genome-wide transcription-induced mutagenesis, a distinct demonstration of specific mutation being replication-dependent and/or transcription-dependent in genomes is yet to be established. Here, we studied synonymous single-nucleotide polymorphisms (SNPs) in 2091 individual coding sequences (CDS) in the leading strand (LeS) and the lagging strand (LaS) of the Escherichia coli chromosome by comparing across 157 strains. The frequencies of complementary transitions (ti) and complementary transversions (tv) were compared in each CDS to assess parity violation in the strands. ResultsThe C[->]T and G[->]A exhibited the maximum frequency as well as the most prominent strand inequality as these tis were influenced both by the strands as well as by the expression. Interestingly, inequality between T[->]C and A[->]G was expression-dependent but strand-independent. A[->]T and G[->]T tvs were universally more frequent than their complementary T[->]A and C[->]A tvs, respectively. ConclusionsOur study demonstrates strand-independent but expression-dependent synonymous SNP inequality in CDS, supporting the role of transcription-induced mutagenesis contributing to strand inequality in the E. coli chromosome.

2

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.1%

6.4%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.

3

Genome-wide identification of rhabdoviral sequences in alfalfa (Medicago sativa L.)

Grinstead, S.; Nemchinov, L. G.

2026-05-22 genomics 10.64898/2026.05.20.726541 medRxiv

Top 0.1%

6.4%

Show abstract

We recently reported the identification of endogenous viral elements (EVEs) originating from the Caulimoviridae family within the alfalfa (Medicago sativa L.) genome. Our subsequent identification of ubiquitous rhabdoviral elements in infected and healthy alfalfa tissues by high throughput sequencing prompted us to suggest that the alfalfa genome might be populated with integrated rhabdoviruses as well. Bioinformatics analysis using 26 publicly available alfalfa genomes proved the suggestion accurate. We found multiple non-retroviral segments of the Rhabdoviridae family belonging to the genera Betanucleorhabdovirus and Betacytorhabdovirus that appeared to be stable constituents of the host genome. In that capacity they could potentially acquire functional roles in alfalfas development and response to environmental stresses. We believe this study reveals the first documented case of rhabdoviruses integrated into the alfalfa genome.

4

Snhg26 long non-coding RNA regulates pluripotent cell states through a SINEB2-derived sequence

Fort, V.; Khelifi, G.; Hussein, S. M. I.

2026-05-24 cell biology 10.64898/2026.05.21.726207 medRxiv

Top 0.1%

4.9%

Show abstract

Abstract/SummaryLong-non coding RNAs (lncRNAs) are now well established players in gene expression regulation, but their detailed molecular mechanisms of action and underlying regulating sequences remain poorly understood. An emerging concept supports the idea that repeated sequences, more notably sequences derived from transposable elements (TEs), contribute to functional domains of lncRNAs. Here, we undertake the characterization of the function, interactors and functional domains of Snhg26, a lncRNA involved in reprogramming towards induced pluripotent stem cells (iPSCs) and maintenance of the pluripotent state of embryonic stem cells (ESCs). First, we show that modulation of Snhg26 expression levels affects expression and splicing of genes involved in pluripotency and chromatin remodeling during the early steps of reprogramming and in ESCs. We also find that down-regulation of Snhg26 increases the expression of SINEB2 transposable elements. Moreover, we identify hundreds of transcripts directly interacting with Snhg26 with a significant enrichment of RNAs containing SINEB2 elements. Strikingly, loss of a SINEB2 sequence embedded within Snhg26 abolishes its function in regulating pluripotent states. Our results thus support the idea that TEs constitute a source of functional units for lncRNAs and encourages further efforts to explore this concept.

5

Gene model for the ortholog of Lst8 in Drosophila yakuba

Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.

2026-05-14 genomics 10.64898/2026.05.12.723325 medRxiv

Top 0.1%

4.3%

Show abstract

Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

6

Using combined RNA/DNA short read sequencing to investigate allele-specific expression from the inactive X chromosome in human cells

Thomas, R.; Blower, M.

2026-05-24 bioinformatics 10.64898/2026.05.21.726886 medRxiv

Top 0.3%

3.5%

Show abstract

Many genomic regions exhibit allele-specific expression. This effect is most pronounced in imprinted genes, where one copy of a gene is epigenetically silenced, and the inactive X chromosome of female cells, where almost the entire chromosome is silenced. Allele specific gene expression can have significant effects on human health and is implicated in a wide array of diseases. Research into allele specific expression is most often carried out in mouse models where cross breeding of mouse strains can yield progeny with well characterised haplotypes where parent of origin is known for a huge number of SNPs. The same approach cannot be taken with human data and haplotypes must be assembled using expensive and labour intensive long read sequencing and Hi-C based approaches. Although resolved haplotypes are available for a number of cell lines, allowing accurate measurement of allele-specific gene expression, this type of analysis is inaccessible for non-specialist labs. We demonstrate how to use previously published haplotypes to investigate X linked gene silencing and epigenetic changes. Additionally, in this paper we present a method to exploit the profound difference in expression levels between the two human X chromosomes to assign SNPs in expressed RNA to the active or inactive X chromosome using only short read DNA and RNA sequencing. We demonstrate this technique using sequencing libraries generated in house and sequencing data from publicly available databases including for a cell line with a complex karyotype. In each instance we identified genes that were silenced in each cell line opening them up to further research avenues. This X chromosome haplotyping technique can be applied to any clonally derived human cell line with 2 or more X chromosomes allowing researchers to investigate X linked gene silencing in cell lines already present in their lab rather than in the limited number of cell lines for which a haplotype is available.

7

Biallelic CYB5A disruptions in 46,XY Disorder of Sex Development: Identification and Characterization of a Novel Deep Intronic Variant

Moradifard, S.; LE, T. N. U.; Ha, N. T.; Dung, V. C.; Thao, B. P.; Harley, V. R.

2026-05-12 genetic and genomic medicine 10.64898/2026.05.05.26352416 medRxiv

Top 0.3%

3.3%

Show abstract

BackgroundThe diagnostic yield for 46,XY disorders of sex development (DSD) remains limited. Whole-genome sequencing (WGS) improves detection of both coding and non-coding variants that may be missed by routine testing. Cytochrome b5, encoded by CYB5A, is an essential co-factor for CYP17A1-mediated 17,20-lyase activity. We report on WGS on a Vietnamese family with 46,XY DSD with two siblings presenting with female external genitalia. MethodsClinical assessment and hormone profiling were conducted. WGS was conducted on peripheral blood DNA, in two affected siblings followed by variant annotation and ACMG-based classification. A minigene RNA splicing assay in HEK293 cells was used to evaluate the functional impact of the CYB5A intronic variant. ResultsThe patients hormone profile showed low testosterone and estradiol. WGS identified compound-heterozygous CYB5A variants: a paternally inherited missense variant (p.Val34Glu, likely pathogenic) and a maternally inherited deep intronic deletion (c.129+862_129+863del) for which SpliceAI predicted aberrant splicing. Minigene assays confirmed that the intronic deletion creates cryptic splice sites, resulting in pseudoexon inclusion and a premature stop codon, consistent with nonsense-mediated decay. The intronic variant meets ACMG criteria for pathogenicity. ConclusionThis family expands the spectrum of CYB5A-related DSD and demonstrates that compound-heterozygous variants, including deep intronic defects, can lead to a disruption in 17,20-lyase activity. These findings highlight the importance of WGS and functional assays for identifying clinically relevant non-coding variants in DSD.

8

Old Yellow Enzyme from Brevibacillus nitrificans functions as 12-oxo-phytodienoic acid reductase in planta

Klein, M.; Hornung, E.; Perle, L.; Feussner, K.; Herrfuth, C.; Keyl, A.; Broeker, L.; Stoehr, L.; Rensing, S. A.; Hamberg, M.; de Vries, J.; Feussner, I.

2026-05-30 biochemistry 10.64898/2026.05.27.728186 medRxiv

Top 0.4%

2.6%

Show abstract

Old Yellow Enzymes (OYEs) are a widely distributed family of ene-reductases that were first described in a Saccharomyces cerevisiae ferment. In plants, cis-12-oxo-phytodienoic acid (cis-OPDA) reductase (OPR) is the best studied OYE. In Arabidopsis thaliana, the peroxisomal AtOPR3 was characterized as the major OPDA reductase, which generates 3-oxo-2-(2-pentenyl)-cyclopentane-1-octanoic acid in the jasmonic acid (JA) biosynthesis. In Atopr3 lines, only small amounts of JA are detectable after wounding. Here, we describe an OPR-like enzyme (named BnOPR) from the gram-positive Brevibacillus nitrificans. The sequence was identified in an early version of the Physcomitrium patens genome and is assumed to be a contamination by a bacterium growing in association with P. patens. In complementation experiments with an Atopr3 line, we demonstrate that expression of BnOPR, fused with a peroxisomal targeting signal, rescues the male infertile phenotype and increases JA and JA-Ile levels. The catalytic parameters of BnOPR were determined for a set of substrates, including cis-OPDA and prednisone. Interestingly, B. nitrificans, B. brevis, and Paenibacillus physcomitrellae were shown to have a positive effect on P. patens growth. HighlightThe bacterial enzyme BnOPR rescues the male infertile phenotype of Atopr3 plants.

9

Long-read sequencing reveals transposable element-derived chimeric transcripts at zygotic genome activation in mammalian embryos

Kawakami, S.; Kitao, K.; Ikeda, S.; Honda, S.

2026-05-28 developmental biology 10.64898/2026.05.25.727629 medRxiv

Top 0.5%

2.6%

Show abstract

BackgroundTransposable elements (TEs) are mobile genomic sequences that constitute one-third to one-half of the mammalian genome. Recently, TEs have been recognized for their important roles as cis-regulatory elements. TEs are broadly activated during zygotic genome activation (ZGA) in mammalian embryos, where they function as alternative promoters of host genes and drive the transcription of chimeric transcripts. However, the construction of comprehensive chimeric transcript databases based on short-read sequencing remains limited due to the repetitive and abundant nature of TEs in the genome. Here, we used long-read RNA sequencing to construct a comprehensive dataset of chimeric transcripts expressed in ZGA mouse and bovine embryos. ResultsWe identified 11,996 and 4,755 chimeric transcripts variants derived from 2,695 and 1,200 host genes in mouse and bovine, respectively, exceeding the numbers reported in previous short-read-based studies. Among them, 114 orthologous pairs produced chimeric transcripts in both species. Gene Ontology analysis revealed significant enrichment of terms related to transcriptional regulation and protein modification in mouse, whereas no terms were significantly enriched in bovine. Assessment of the protein-coding potential of the TE-driven transcripts using predicted open reading frames (ORFs) revealed that the proportion of "Protein-coding" transcripts was lower, whereas that of "LncRNA" (long non-coding RNA) was higher compared with all transcripts in both species. Among the ORFs classified as "Protein-coding", comparison with canonical ORFs revealed a tendency for the N terminus to be truncated while the C terminus remained intact in both species. TE-derived promoters used in mouse were enriched for mouse-specific TEs, whereas those in bovine were enriched for older TEs conserved among eutherians. In addition, long-read sequencing detected a greater number and proportion of TEs used as promoters in mouse and bovine than short-read sequencing. Although motif analysis identified KLF5 and OTX2 binding sites upstream of TE-derived promoters in both species, the specific TEs containing these motifs differed between the two species. ConclusionsThis study presents the first long-read sequencing analysis of chimeric transcripts in mammalian embryos in two species. Our approach revealed the functional similarities of chimeric transcripts between species, as well as species-specific differences in their TE compositions.

10

Alternative polyadenylation and the sex-specific gene expression program in hemp

Shivakumar, A.; Hunt, A. G.; Chakrabarti, M.

2026-05-17 plant biology 10.64898/2026.05.13.725035 medRxiv

Top 0.5%

2.6%

Show abstract

Hemp (Cannabis sativa) produces a wide array of medicinally significant compounds, including cannabidiol (CBD). These compounds are predominantly synthesized in female hemp inflorescences. The proposed research utilizes next-generation sequencing-based transcriptome analysis using a 3{square}-end-directed approach to identify differentially expressed genes between male and female hemp plants at the early vegetative stage. 886 differentially expressed genes (DEGs) were identified, a majority of which were upregulated in males compared to females. We hypothesized that alternative RNA processing contributes to sex-specific gene expression. To this end, 932 genes were identified that exhibited significant changes in poly(A) site usage when comparing males and females. These genes were much more likely to be differentially expressed, supportive of this hypothesis. Males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. This suggests that polyadenylation remodels hemp mRNAs with distal poly(A) sites being preferred in males. To further investigate when this sex-specific gene expression program is established, RNA was isolated from plants at various developmental stages, such as developing seeds, four-day-old seedlings, and different developmental stages up to four weeks after sowing. Diagnostic male-specific genes were analyzed using RT/PCR. The results indicate that sex-specific gene expression is not evident in seeds but rather is set during or after germination. SignificanceO_LIHemp males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. C_LIO_LIThe sex-specific gene expression program is not yet established in mature seed but is set in the time between germination and 4 days of growth. C_LI

11

Whole-exome-based preconception carrier screening in Uzbekistan with targeted SMA, FMR1, and DMD assays: the first reported clinical program

Kullyev, A.; Avdeichik, S.; Akimenkova, A.; Kartuesov, A.; Kardymon, O.; Goikhman, Y.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.02.26354713 medRxiv

Top 0.6%

2.1%

Show abstract

Abstract Purpose: Published clinical outcome data on preconception carrier screening (PCS) in Central Asia are limited. We report the first clinical implementation study from Uzbekistan of a whole-exome sequencing (WES)-based multi-platform PCS program combining exome sequencing with targeted SMA, FMR1, and DMD assays. Methods: We retrospectively analyzed anonymized data from 65 individuals (19 couples, 27 singletons) screened at IMC Genomics, Tashkent, between January 2024 and May 2026. WES covering the protein-coding regions of approximately 20,000 genes was followed by exome-wide bioinformatics filtering and clinical geneticist interpretation. Partly overlapping cohorts underwent SMA carrier screening (n=179), FMR1 CGG-repeat analysis in females (n=155), and DMD deletion/duplication testing in preconception females (n=29). Variants were classified by ACMG/AMP criteria against gnomAD v4.1. Results: Sixty-one of 65 WES-screened individuals (93.8%; 95% CI 85.2 - 97.6%) carried at least one reportable variant (152 instances across 126 genes). Four of 19 couples (21.1%; 95% CI 8.5 - 43.3%) were concordant for pathogenic or likely pathogenic variants in the same autosomal recessive gene; two were referred for preimplantation genetic testing for monogenic disease. SMA screening identified four carriers, including two 2+0 silent carriers; FMR1 analysis identified one intermediate allele; DMD MLPA identified no exonic rearrangements. Conclusion: This first reported WES-based multi-platform PCS program in Uzbekistan was feasible and clinically informative, identifying actionable couple-level reproductive risks and supporting structured implementation of reproductive genetic screening in Central Asia.

12

In silico restriction site analysis of whole genome sequences shows patterns caused by selection and sequence duplications

Vedder, L.; Schoof, H.

2026-05-16 genomics 10.64898/2026.05.15.725336 medRxiv

Top 0.6%

2.1%

Show abstract

Biological sequences are known to be not random. Thus, the comparison of in silico restriction fragment distributions of random and biological sequences may be an indicator of this non-randomness. Our analyses show that for most of the tested combinations of restriction enzyme and genome sequence the fragments per Megabase of the biological sequence deviate at least more then 10% from the corresponding random sequence. This deviation goes into both directions, i.e. clearly increased values are as common as clearly decreased values. Although there is no species- or restriction-enzyme-specific effect, a clear impact of the GC content both of the restriction site and of the genome sequence can be seen. In contrast to the random sequences, the genome sequences show distinct peaks in their fragment length distributions, hinting to repetitive elements such as transposons.

13

Inferring the demographic history of Chinese and Indian rhesus macaque (Macaca mulatta) populations from PacBio HiFi long-read sequencing data

Heenkenda, E. J.; Versoza, C. J.; Terbot, J. W.; Soni, V.; Spatola, G. J.; Pfeifer, S. P.; Jensen, J. D.

2026-05-26 evolutionary biology 10.64898/2026.05.25.727731 medRxiv

Top 0.6%

2.1%

Show abstract

The rhesus macaque (Macaca mulatta) is one of the most widely used animal models in biomedical research, both as it resembles humans in key biological aspects and as it is characterized by a broad geographic range. Most of the individuals housed in U.S. research colonies have been sampled from either China or India, though notably the source population of these animals has significantly shifted over time. Given the substantial genetic and immunological differences between these populations, a deeper understanding of the underlying population structure is critically important for biomedical interpretation. Despite this, the demographic histories of these two populations remain poorly resolved. Here, we present an analysis of whole-genome, PacBio HiFi long-read sequencing data from ten unrelated individuals of each population, applying four related model- and non-model based demographic inference approaches, in order to reconstruct their ancestral history. We evaluated the fit of the subsequently estimated models against the empirical data, and incorporated underlying uncertainty in the mutation rates used for scaling. We inferred a well-fitting population history characterized by substantial structure between Chinese and Indian populations, with a split time [~]140,000 generations ago from an ancestral population of [~]65,000 individuals. We additionally inferred the subsequent history of size change within, and gene flow between, these populations, reaching the current estimated sizes of [~]220,000 individuals in the Chinese population and [~]14,000 individuals in the Indian population. The robust baseline demographic model established in this study will serve as a valuable resource for future research on this species, including for improved fine-scale recombination mapping, selection inference, and association studies.

14

A framework for identifying transcript orthologs: the evolution of sex bias in alternative transcript structure in Drosophila

.Bankole, K.; McIntyre, L.; Garan, M.; Morse, A. M.; Keil, N.; Hernandez, A.; Barmina, O.; Khan, M.; Kopp, A.; Rogers, R.; Graze, R. M.

2026-05-26 genomics 10.64898/2026.05.25.727716 medRxiv

Top 0.6%

2.0%

Show abstract

BackgroundRecent advances in long read technologies provide an unprecedented opportunity to study transcript evolution. However, comparative evolutionary studies, even in Drosophila, are limited by inconsistent and incomplete annotation, and the lack of annotated transcript homology. ResultsIn this study of five species spanning 28 million years (D. melanogaster, D. simulans, D. yakuba, D. santomea and D. serrata), we infer transcript homology using reciprocal liftover, and orthology using network analyses, with data validation from long read RNA-seq of male and female head tissue. We build the first genus level annotation, with 15,996 genes and 56,370 transcripts. Expressed transcripts are conserved, 73% of transcript orthologs are detected in all species. Even the improved annotation underestimates the number of genes with alternative transcripts, with 75% of genes expressing multiple structurally diverse transcripts. In a replicated quantitative evaluation of [~]10,000 genes, both male and female-biased transcripts are expressed in 410 (D. melanogaster), 608 (D. simulans), and 493 (D. serrata) genes and in 118 orthologous genes in the D. melanogaster - D. simulans species pair, indicating greater potential for resolution of sexual conflict by alternative transcription than previously appreciated. We identified 605 transcript orthologs conserved for sex bias in the D. melanogaster-D. simulans species pair and of these, 22 male and 19 female-biased transcripts were conserved in sex bias with the outgroup D. serrata, including transcripts of genes involved in brain development, Sxl target Glutamine synthetase 2 and ciboulot. ConclusionsConserved alternative transcripts suggest that transcriptional diversity is a pervasive driver of the evolution of functional diversity.

15

RT-nested and interfering-Primer PCR reveal prevalent isoform-specific A-to-I RNA editing in neuronal genes

Wang, Z.; Ni, Y.; Cai, W.; Li, H.; Duan, Y.

2026-05-17 molecular biology 10.64898/2026.05.15.725286 medRxiv

Top 0.7%

1.9%

Show abstract

BackgroundMetazoan adenosine-to-inosine (A-to-I) mRNA editing temporospatially diversifies the neuronal transcriptome and proteome. The limited read length from next-generation sequencing (NGS) constrains the quantification of the potentially differential editing levels across different splicing isoforms, restricting our understanding of the extent to which RNA editing contributes to molecular diversity and its interplay with splicing. MethodsWe employed reverse transcription nested PCR (RT-nPCR) and developed a novel interfering-Primer PCR (iPrimer PCR) technique to distinguish different transcripts of any gene. We selected multiple essential genes exhibiting RNA editing in coding sequences (CDSs) or untranslated regions (UTRs) for isoform-specific amplification and Sanger sequencing. ResultsNine different Adar isoforms together with pre-mRNA had distinct editing levels at the S>G auto-recoding site, which was predicted to have isoform-specific effects on catalytic activities. Although pre-mRNA editing might exert isoform-dependent promotion/suppression of splicing, closely located editing sites, such as those in neuronal genes qvr and stj, still exhibited high correlation in editing levels due to co-editing. iPrimer strategy further discovered differential recoding levels between the long/short 3UTR isoforms of gene jef. ConclusionsWe provide the first comprehensive solution for isoform-specific PCR amplification of any gene, enabling quantification of RNA editing level of different isoforms. Our results offer insights into how RNA editing interplays with splicing, and highlight its complicated role in expanding molecular diversity. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=79 SRC="FIGDIR/small/725286v1_ufig1.gif" ALT="Figure 1"> View larger version (17K): org.highwire.dtl.DTLVardef@1ebc82org.highwire.dtl.DTLVardef@1ea365dorg.highwire.dtl.DTLVardef@1971aceorg.highwire.dtl.DTLVardef@160d053_HPS_FORMAT_FIGEXP M_FIG C_FIG We developed isoform-specific PCR followed by Sanger sequencing, and achieved the quantification of differential RNA editing levels in different transcripts of a gene.

16

Evolutionary genomics based on PacBio HiFi long-read sequencing data reveals the importance of structural variants in shaping population-specific differences between Chinese and Indian rhesus macaques (Macaca mulatta)

Maruki, T.; Versoza, C. J.; Jensen, J. D.; Pfeifer, S. P.

2026-05-29 evolutionary biology 10.64898/2026.05.27.728199 medRxiv

Top 0.8%

1.8%

Show abstract

Rhesus macaques (Macaca mulatta) are the most widely used non-human primate model for translational research relevant to human health and disease. Although several genetically distinct populations have been recognized across the species extensive habitat range in Asia, the majority of biomedical studies in the United States and abroad focuses on individuals of either Chinese or Indian descent. Notably, phenotypic differences exist between these two populations which can influence biomedical research outcomes; however, the genetic basis and molecular mechanisms underlying these differences are generally not well understood. Based on novel PacBio HiFi long-read sequencing data from 20 rhesus macaques -- ten of Chinese origin and ten of Indian origin -- we here characterize the genome-wide landscape of structural variation in these two biomedically-relevant populations. Our results highlight differences in the structural variant landscape affecting genes involved in neural communication and signaling pathways, in line with the known differences in temperament between the two populations. Furthermore, while the majority of discovered structural variants were located in intergenic and non-coding regions of the genome, 15 of the discovered population-specific structural variants were predicted to exhibit a high functional effect on genes associated with human disease, indicating that they may play an important role in shaping the differences in disease susceptibility between the populations. Taken together, by providing detailed insights into population-specific structural variation, this genomic resource will aid the design and interpretation of future studies aiming to link genotype, phenotype, and fitness in the context of human health and disease, and facilitate broader comparative analyses of structural variation as a force shaping genome evolution across primates.

17

E-InfertilityTest: An Explainable AI Framework for Male Infertility Assessment

Das, G.; Ghosh, B.; Ghosh, Z.

2026-05-25 bioinformatics 10.64898/2026.05.21.726746 medRxiv

Top 0.9%

1.7%

Show abstract

Male infertility has emerged as a significant concern in modern society, with genetic defects as one of the major underlying cause behind it. This impairment negatively impacts sperm motility and morphology, leading to conditions such as Asthenozoospermia (reduced sperm motility), Teratozoospermia (abnormal sperm morphology) and sometimes Asthenoteratozoospermia (both motility and morphology defects). Assisted reproductive technologies (ART), such as in-vitro fertilization (IVF), offer a potential solution for such cases but with a low success rate. Classical semen analysis provides only a phenotypic snapshot without revealing the fertilizing potential of the sperms. Hence, in order to screen the functional sperm population as well as to get a deeper insight into the reasons underlying the aberrant sperm population, it is important to study their genetic profile. In this work, we have performed a meta analysis of the transcriptomic data of infertile sperms from Asthenozoospermia and Teratozoospermia patients with that from fertile sperms of normal individuals. Thereafter we have screened a signature gene set which has been used to develop a prediction model named Explainable Infertility Test (E-InfertilityTest) to classify between fertile versus infertile sperm at the preliminary level. For each prediction, it will also provide the set of genes which are playing a dominant role towards such prediction. Thus, it will provide patient specific dominant gene expression profile responsible for the aberration. This work warrants validation experiments in future to substantiate the models performance in a clinical setting. User can access the tool named E-InfertilityTest as a standalone version on GitHub. Github Linkhttps://github.com/zglabDIB/einfertility.git

18

Verification of human nucleotide sequence reagents and cell line identities in original circRNA articles published in high impact factor journals

Pathmendra, P.; Enguita, F. J.; Byrne, J. A.

2026-05-29 genomics 10.64898/2026.05.28.728608 medRxiv

Top 0.9%

1.7%

Show abstract

Numbers of research articles studying circRNAs have increased rapidly since 2017. Previous analyses of human circRNA articles in two high impact factor cancer research journals identified papers with wrongly identified nucleotide sequence reagents and circRNAs whose identities could not be independently verified. In the present study, verification of human nucleotide sequence reagent and cell line identities in retracted circRNA articles published from 2017-2021 in high impact factor journals found wrongly identified nucleotide sequences and/or cell lines in all 13 retracted papers. Similar analyses of human circRNA papers published in high impact factor journals in 2022 found wrongly identified, non-verifiable and/or questionable reagents in 71% (84/118) papers, where 51% (60/118) papers described at least one wrongly identified reagent. When individual error types and features of concern were considered, 2022 circRNA papers described wrongly identified nucleotide sequence reagents (52/118, 44%), questionable circRNA probes that did not meet accepted targeting requirements (34/118, 29%), non-verifiable nucleotide sequences (25/118, 21%), wrongly identified cell lines (22/118, 19%), and/or non-verifiable cell line identifiers (6/118, 5%). In summary, wrongly identified, non-verifiable and/or questionable reagents were unexpectedly frequent in human circRNA papers in high impact journals, highlighting the need for critical engagement with the circRNA literature.

19

Pharmacogenetic Characterization of Cytochrome P450 Genes involved in Psychotropic Medication Metabolism in a Cohort of Patients with Prader-Willi Syndrome

Moreno-Armengol, A.; Pareja, R.; Hernandez-Lazaro, A.; Capel, L.; Corripio, R.; Caixas, A.; Baena, N.

2026-05-18 pharmacology and therapeutics 10.64898/2026.05.09.26352521 medRxiv

Top 1%

1.7%

Show abstract

Prader-Willi syndrome (PWS) is a rare multisystemic disorder characterized by obesity, endocrine dysfunctions, and psychiatric comorbidities, which imply frequent use of psychotropic medications. They account for atypical responses to standard dosages of psychiatric drugs. Pharmacogenetics could be part of the reason for this situation, potentially offering a valuable tool for individualized treatment. This study analyzed allelic and phenotypic frequency distributions of five of the main cytochrome P450 enzymes (CYP2D6, CYP2B6, CYP2C19, CYP2C9, CYP3A4) involved in psychiatric drug metabolism in 47 patients with genetically confirmed diagnosis of PWS and compared them to reference frequencies in the general European population. Allelic frequency comparisons between the European reference population and the overall PWS cohort revealed a significant global difference for CYP2B6, with CYP2C19 and CYP2D6 showing trends toward significance. Although no global allelic differences remained significant after false discovery rate correction, post-hoc analyses consistently identified an enrichment of reduced- or non-functional alleles CYP2B619 and CYP2D610 in patients with PWS. Predicted metabolizer phenotype analyses showed a significant shift toward intermediate metabolizers of CYP3A4 in the PWS cohort, with corresponding depletion of normal metabolizers. Subgroup analyses indicated that allelic differences were more pronounced in maternal uniparental disomy and non-deletion subtypes, particularly for CYP2B6, although no significant differences were observed between PWS genetic subtypes. Overall, results imply potential differences in metabolizing activity in PWS patients, and subsequent implications in drug efficacy and tolerability. These results support the idea that pharmacogenetic testing may improve therapeutic decision-making in PWS for psychiatric treatment. Larger studies are needed to confirm these preliminary results.

20

New insight into the RNA-chaperon activity of nucleobindin 1

Kostareva, O. S.; Eliseeva, I. A.; Buyan, A. I.; Lyabin, D. N.; Tishchenko, S. V.; Mikhaylina, A. O.

2026-05-22 molecular biology 10.64898/2026.05.22.727093 medRxiv

Top 1%

1.5%

Show abstract

Nucleobindin 1 (NUCB1) is a multifunctional conserved protein located in Golgi luminal, nucleus, extracellular and cytosolic pools. NUCB1 is multidomain protein comprised of a signal peptide, a DNA-binding domain, a leucine zipper and Ca2+ -binding domain. The multiple domains and localization of NUCB1 potentiates its interactions with various partners, such as DNA, Gi3 protein, cyclooxygenase 2, LRP10 and RNA suggests its importance in the regulation of many cellular events. We revealed that NUCB1 contains three RNA-binding regions and able to interact with two RNA fragments. It was suggested possible variants of the participation of NUCB1 in the interaction of the two partially complementary RNAs. The RNA-binding properties of the NUCB1 were also confirmed in vivo experiments.