Back

Genome Biology and Evolution

Oxford University Press (OUP)

Preprints posted in the last 30 days, ranked by how well they match Genome Biology and Evolution's content profile, based on 280 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.

1
Eco-evolutionary dynamics of defense systems in mobile genetic elements: Cui bono?

Iranzo, J.; Wolf, Y. I.; Koonin, E. V.

2026-05-26 evolutionary biology 10.64898/2026.05.25.727639 medRxiv
Top 0.1%
10.2%
Show abstract

BackgroundMobile genetic elements (MGEs), including viruses, plasmids, and transposons, are major drivers of evolution in bacteria and archaea. Host-parasite conflicts drive the emergence of a broad variety of defense and counter-defense systems. Recent advances in metagenomics and functional annotation have shown that many defense systems are located on MGEs. The fact that MGEs are, essentially, genomic parasites raises an intriguing question: why do these parasites carry defense systems at high prevalence, often even higher than the host chromosome? ResultsWe developed a simple mathematical model to investigate the factors that promote evolution of defense systems in MGEs and the ecological implications of MGE-encoded defense. Our analysis points to the strength of inter-MGE interference as a key determinant of the evolution of defense systems in MGEs. We identify two qualitatively distinct regimes, depending on the basic reproductive number in mixed coinfections. Weakly interfering MGEs tend to carry low-cost defense systems that enhance the survival of their hosts upon exposure to more damaging MGEs. Although these systems can be occasionally transferred to the host, they typically remain in MGEs. In contrast, strongly interfering MGEs, such as plasmids from the same incompatibility group, can carry high-cost defense systems that are detrimental to the host and the population as a whole, but help their carriers spread by actively replacing their competitors. ConclusionsAnalysis of our model shows that the key determinant of the evolution and spread of defense systems in MGEs is the strength of cross-MGE interference. Weakly interfering MGEs would serve as MGE banks, typically carrying low-cost defense systems that can benefit the host by protecting it from more damaging MGEs. In contrast, strongly interfering MGEs would carry costly defense systems that mediate inter-MGE conflicts but are deleterious to the host. These MGEs could serve as proving grounds for emerging defense systems, which might eventually become cost-effective once optimized by selection.

2
Lifestyles of Gypsy-family transposons shape their regulatory mechanisms

Papameletiou, A.-M.; Czech Nicholson, B.; Bornelöv, S.; Hannon, G. J.

2026-05-21 genomics 10.64898/2026.05.19.726053 medRxiv
Top 0.1%
10.1%
Show abstract

Transposable elements are a highly diverse group of selfish genomic elements, prevalent across the tree of life, whose uncontrolled propagation poses a threat to genome stability. Recent studies have explored the evolution of Drosophila melanogaster transposable elements, their co-evolution with the host genome, and mechanisms that regulate their activity. However, little is known about their cross-species evolutionary patterns. Long terminal repeat (LTR) retrotransposons are the most active group of transposable elements in Drosophila. They are broadly separated into retroelements, which are active in the germline, and insect endogenous retroviruses that are active in the soma. Somatic elements are hypothesised to infect the germline through their acquisition of virus-derived proteins such as Envelope and sORF2, thus multiplying through successive generations. In this study, we curated the sequences of LTR retrotransposons in 249 drosophilid genomes, allowing us to study their evolution across these species and highlight their varying degrees of conservation. Furthermore, we reveal multiple instances of Envelope protein loss or inactivation that suggest shifts in the expression pattern of these transposons, likely accompanied by adopting different transcriptional control mechanisms. We contrast this with the evolutionary history of sORF2, which we found to be much more stable. Lastly, we examined variations in transposon LTR regions responsible for transcriptional regulation and use predictive modelling to suggest six transcription factors likely involved in their tissue-specific expression. Altogether, we reveal complex, interspecies evolutionary patterns of Gypsy-family LTR retrotransposons and highlight examples of their co-evolution with their host genome.

3
Evolution of regulatory networks controlling plasticity in gene expression between Saccharomyces cerevisiae and Saccharomyces paradoxus

Redhuis, A. C.; Wittkopp, P. J.

2026-05-20 evolutionary biology 10.64898/2026.05.18.725926 medRxiv
Top 0.1%
9.9%
Show abstract

Organisms cope with environmental changes by modifying gene expression. To understand how regulatory networks controlling expression plasticity evolve, we analyzed RNAseq data from Saccharomyces cerevisiae, Saccharomyces paradoxus, and their F1 hybrids at multiple timepoints after transferring cells from standard laboratory conditions to five environments (low phosphorus, low nitrogen, hydroxyurea shock, heat stress, and cold stress) and during the diauxic shift. In each of the six datasets, we identified genes that changed expression following the transition to the new environment and used hierarchical clustering to identify genes that increased or decreased in expression. We then compared these classifications between orthologs to identify genes with divergent plasticity. For some genes, plasticity was more extreme in one species than the other, and for others, expression of orthologs changed in opposite directions when acclimating to the same environment. Most cases of plasticity divergence were seen only in one environment and were attributable primarily to trans-regulatory divergence. Using environment-specific regulatory networks inferred from data in Yeastract, we found that divergent plasticity of environment-specific transcription factors generally did not predict divergent plasticity of their target genes. We also found that, as a group, genes with conserved plasticity tended to have more regulatory interactions than genes with divergent plasticity. Interesting patterns of expression divergence were also observed for five transcription factors in the pleiotropic drug resistance network and their target genes that might contribute to phenotypic divergence. Together, these findings show how environment-specific trans-regulatory divergence and combinatorial gene regulation shape the evolution of expression plasticity.

4
Origins of eukaryotic metabolism

Santana-Molina, C.; Spang, A.; Snel, B.

2026-05-12 evolutionary biology 10.64898/2026.05.08.723234 medRxiv
Top 0.1%
8.2%
Show abstract

The origin of eukaryotes is a key event in the evolution of cellular life hypothesized to involve a symbiotic integration between a member of the Asgard archaea and the Alphaproteobacteria. Recent work has provided evidence for additional genetic input from other prokaryotes to the eukaryotic proteome yet the extent and sources of these contributions remain debated. Here we aimed to further resolve the prokaryotic origins of eukaryotic genes to inform our understanding of eukaryogenesis. Specifically, we developed a phylogenetic framework to investigate the origins of eukaryotic gene families associated with metabolism and informational processing for comparison. We found that informational processing genes were predominantly derived by archaea whereas eukaryotic metabolism is highly chimeric in its origin. In contrast to previous studies, we report a substantial number of archaeal origins of diverse metabolic enzymes including key metabolic regulators. This highlights an overlooked participation of archaeal metabolism and pinpoints potential metabolic integrations during eukaryogenesis. Apart from the alphaproteobacterial contributions to the eukaryotic metabolism, we found an additional dominant phylogenetic signal of genes potentially derived from Myxococcota, especially for gene families associated with lipid metabolism. By systematically analysing the origins of eukaryotic metabolism, this research offers novel insights into the origin of eukaryotic membranes and refine our current models for the origin of the eukaryotic cell.

5
A framework for identifying transcript orthologs: the evolution of sex bias in alternative transcript structure in Drosophila

.Bankole, K.; McIntyre, L.; Garan, M.; Morse, A. M.; Keil, N.; Hernandez, A.; Barmina, O.; Khan, M.; Kopp, A.; Rogers, R.; Graze, R. M.

2026-05-26 genomics 10.64898/2026.05.25.727716 medRxiv
Top 0.2%
6.7%
Show abstract

BackgroundRecent advances in long read technologies provide an unprecedented opportunity to study transcript evolution. However, comparative evolutionary studies, even in Drosophila, are limited by inconsistent and incomplete annotation, and the lack of annotated transcript homology. ResultsIn this study of five species spanning 28 million years (D. melanogaster, D. simulans, D. yakuba, D. santomea and D. serrata), we infer transcript homology using reciprocal liftover, and orthology using network analyses, with data validation from long read RNA-seq of male and female head tissue. We build the first genus level annotation, with 15,996 genes and 56,370 transcripts. Expressed transcripts are conserved, 73% of transcript orthologs are detected in all species. Even the improved annotation underestimates the number of genes with alternative transcripts, with 75% of genes expressing multiple structurally diverse transcripts. In a replicated quantitative evaluation of [~]10,000 genes, both male and female-biased transcripts are expressed in 410 (D. melanogaster), 608 (D. simulans), and 493 (D. serrata) genes and in 118 orthologous genes in the D. melanogaster - D. simulans species pair, indicating greater potential for resolution of sexual conflict by alternative transcription than previously appreciated. We identified 605 transcript orthologs conserved for sex bias in the D. melanogaster-D. simulans species pair and of these, 22 male and 19 female-biased transcripts were conserved in sex bias with the outgroup D. serrata, including transcripts of genes involved in brain development, Sxl target Glutamine synthetase 2 and ciboulot. ConclusionsConserved alternative transcripts suggest that transcriptional diversity is a pervasive driver of the evolution of functional diversity.

6
Evolutionary rate correlations reveal long-term co-evolutionary interactions in Drosophila melanogaster

Dagilis, A. J.; DiAngelis, B.; Lee, S.; Matute, D. R.

2026-05-23 evolutionary biology 10.64898/2026.05.21.726714 medRxiv
Top 0.3%
4.8%
Show abstract

Co-evolution between genes can occur for a variety of reasons, including co-expression of genes, epistatic interactions between them, physical interactions of gene products and many others. Co-evolutionary partners of a gene are therefore of great interest in identifying potential factors that contribute to any phenotype of interest. State-of-the-art approaches to detect these interactions use correlations of evolutionary rates across a broader phylogeny, and so by necessity identify interactions only among genes that are present across long evolutionary time periods. This makes the methods unwieldy when interest lies in a single focal organism in which the genes of interest may have evolved in the recent evolutionary past. Here, we present a new approach to calculating evolutionary rate correlations which focuses on extracting maximum coverage for a single focal species, while retaining signals of co-evolution across large clades. We show how this approach is able to identify potential interactions even in highly studied species and highly studied genes, with a focus on the D. melanogaster sex-determiner, Sxl, using data from 72 species of Dipterans.

7
Root-level loss of immunoglobulin and B-cell immune genes in clingfishes

Gambon Deza, F.

2026-05-18 evolutionary biology 10.64898/2026.05.16.725622 medRxiv
Top 0.4%
4.0%
Show abstract

Immunoglobulin genes are a central component of jawed-vertebrate adaptive immunity. A previous study showed that the blunt-snouted clingfish Gouania willdenowi lacks immunoglobulin genes and T-cell receptor gamma/delta loci, while retaining T-cell receptor alpha/beta genes, MHC genes, and RAG1 /RAG2. Here I extend that observation to the family Gobiesocidae using all seven chromosome-level Gobiesocidae genome assemblies currently available. Manual tblastn and synteny-guided searches found no convincing immunoglobulin heavy-chain or light-chain loci in G. willdenowi, Gouania pigra, Gobiesox punctulatus, Apletodon dentatus, Lepadogaster candolii, Lepadogaster purpurea, or Diplecogaster bimaculata. Thus, the absence of antibody genes is best interpreted as a root-level character of clingfishes. The latest seven-species screen of 40 additional immune-associated genes shifts the broader interpretation in the same direction: the B-cell/adaptive core genes CD79A, CD79B, CIITA, TNFRSF13B, and TNFSF13B lack strong tblastn support in all sampled Gobiesocidae, and 37 of the 40 tested targets show an all-zero binary pattern at the presence threshold. Only IL21R.1, TYROBP, and TNFRSF11A show strong hits in one or more species. I therefore interpret the principal immune-gene erosion as occurring at or near the Gobiesocidae root rather than as a recent Gouania-specific process, while keeping weak, paralog-sensitive, and patchy loci provisional. RAG2 comparisons show a shared Gobiesocidae PHD-domain C-to-S replacement in the zinc-binding motif, with apparently intact RAG2 coding sequence. A family-wide TRG/TRD screen did not recover TRGV V segments or accepted TRDC constant-region exons, but it did detect TRGC-like constant exons in several genomes. These TRGC-like sequences are probably not canonical TRG constant exons without further validation, so I treat the gamma/delta system as eroded or rearranged rather than as a complete root-level loss equivalent to the Ig loss. The RAG2 variant provides a plausible molecular context for antigen-receptor remodeling, but it is not evidence that RAG genes are pseudogenized, because TCR alpha/beta, MHC genes, and RAG1 /RAG2 are retained. Gobiesocidae are therefore best described as a vertebrate family with ancestral loss of canonical immunoglobulin genes and associated root-level erosion of B-cell and immune-related genes, not as a lineage lacking adaptive immunity in its entirety. HighlightsO_LISeven chromosome-level Gobiesocidae genomes lack convincing canonical IgH and IgL loci. C_LIO_LIThe strongest non-Ig losses map to the B-cell/adaptive core: CD79A, CD79B, CIITA, TNFRSF13B, and TNFSF13B. C_LIO_LITCR alpha/beta, MHC genes, and RAG1 /RAG2 are retained, so Gobiesocidae should not be described as lacking adaptive immunity in full. C_LIO_LIA shared Gobiesocidae RAG2 PHD-domain C-to-S variant provides candidate molecular context for antigen-receptor remodeling. C_LI

8
A draft de novo assembly of Diadema antillarum, a keystone herbivore of the Caribbean reefs

Majeske, A. J.; Wong, J.; Farkas Pool, C.; EIRIN-LOPEZ, J.; Wolfsberger, W.; Schizas, N. V.; Diaz-Lameiro, A. M.; Castro-Marquez, S. O.; Hilkert, K.; Mercado Capote, A. J.; Oleksyk, T. K.

2026-05-27 genetics 10.64898/2026.05.24.727502 medRxiv
Top 0.4%
4.0%
Show abstract

We generated the first reference-level nuclear genome assembly of the keystone Caribbean long-spined black sea urchin species, Diadema antillarum (Philippi, 1845). Using whole-genome sequencing data from PacBio HiFi, Oxford Nanopore, and Illumina platforms, we employed multiple assembly strategies to generate a high-quality, near-complete genome. The final assembly spans 1.73 Gbp, consists of 2,964 scaffolds, and has an N50 of 1.56 Mbp. BUSCO analysis (metazoa_odb10) indicates 98.4% completeness. The genome displays a heterozygosity rate of 2.52% and contains 42.85% repetitive elements, of which 29.96% are unclassified. Coverage analysis reveals that while most of the genome was assembled at 11x depth, certain regions exhibit up to 530x coverage. Notably, regions exceeding 33x coverage account for 30.53% of the repetitive content, suggesting localized expansion of repeats. Duplication analysis of the assembled contigs shows that approximately 66% of contigs have duplicated, which supports segmental genome duplication in the past, and is further evidenced by the moderate level of heterozygosity of the assembly. While these characteristics contribute to the complexity of the genome, they do not diminish the quality of our assembly. Despite this complexity, our assembly maintains high completeness and contiguity. Our assembly provides a valuable resource for future genetic studies and serves as a critical framework for conservation, monitoring, and restoration of D. antillarum populations across the Caribbean.

9
Promises and limitations of local ancestry inference in imputed ancient genomes

Bougiouri, K.; Irving-Pease, E. K.; Frantz, L. A. F.; Racimo, F.; Petr, M.

2026-05-20 evolutionary biology 10.64898/2026.05.19.725905 medRxiv
Top 0.4%
3.9%
Show abstract

Recent advances in genome imputation have enabled the application of state-of-the-art statistical methods--originally developed for present-day genomes--to ancient genomes. One class of such methods, known as local ancestry inference (LAI), can model an individuals genome as a mosaic of tracts assigned to different putative ancestral sources, revealing patterns of genetic ancestry across the genome. However, most LAI methods have been designed to study recent admixture events in human history, and they generally assume large panels of present-day genomes. Despite the recent availability of high-quality imputed ancient genomes, it remains unknown to what degree LAI inference is reliable for such datasets. Ancient DNA is often characterized by heterogeneous geographic and temporal sampling, varying degrees of divergence between ancient source proxies and admixing populations, and complex demographic histories. Here, we performed an extensive set of population genetic simulations to evaluate the accuracy of four popular LAI methods-RFMix, FLARE, MOSAIC and simpLAI-under different demographic scenarios, various temporal sampling schemes, sample sizes, and admixture dates. We quantify the accuracy of these methods as a function of different parameters in practically relevant scenarios, and provide general guidelines for future studies utilizing LAI in ancient DNA research.

10
Inferring the demographic history of Chinese and Indian rhesus macaque (Macaca mulatta) populations from PacBio HiFi long-read sequencing data

Heenkenda, E. J.; Versoza, C. J.; Terbot, J. W.; Soni, V.; Spatola, G. J.; Pfeifer, S. P.; Jensen, J. D.

2026-05-26 evolutionary biology 10.64898/2026.05.25.727731 medRxiv
Top 0.4%
3.9%
Show abstract

The rhesus macaque (Macaca mulatta) is one of the most widely used animal models in biomedical research, both as it resembles humans in key biological aspects and as it is characterized by a broad geographic range. Most of the individuals housed in U.S. research colonies have been sampled from either China or India, though notably the source population of these animals has significantly shifted over time. Given the substantial genetic and immunological differences between these populations, a deeper understanding of the underlying population structure is critically important for biomedical interpretation. Despite this, the demographic histories of these two populations remain poorly resolved. Here, we present an analysis of whole-genome, PacBio HiFi long-read sequencing data from ten unrelated individuals of each population, applying four related model- and non-model based demographic inference approaches, in order to reconstruct their ancestral history. We evaluated the fit of the subsequently estimated models against the empirical data, and incorporated underlying uncertainty in the mutation rates used for scaling. We inferred a well-fitting population history characterized by substantial structure between Chinese and Indian populations, with a split time [~]140,000 generations ago from an ancestral population of [~]65,000 individuals. We additionally inferred the subsequent history of size change within, and gene flow between, these populations, reaching the current estimated sizes of [~]220,000 individuals in the Chinese population and [~]14,000 individuals in the Indian population. The robust baseline demographic model established in this study will serve as a valuable resource for future research on this species, including for improved fine-scale recombination mapping, selection inference, and association studies.

11
Genomic Architecture, Differentiation, and Adaptation in Three Large Falcons

Wilcox, J. J. S.; Arca-Ruibal, B.; Boissinot, S.; Idaghdour, Y.

2026-05-24 genomics 10.64898/2026.05.21.726861 medRxiv
Top 0.4%
3.6%
Show abstract

Recent chromosomal rearrangements and divergence in large falcon species make them excellent foci for studies on evolution and genomic architecture. Here, we use high-coverage (44-74X) whole genome resequencing with 10X Genomics Linked-Reads to assess patterns of genomic divergence in peregrine, saker, and gyrfalcons and we link these to chromosomal type and chromosomal rearrangements. We first use admixture analysis and cross-coalescent MSMC2 to demonstrate distinct species boundaries between the large falcons and retrace their demography. We assessed genomic landscapes in terms of recombination rate, nucleotide diversity ({pi}), Tajimas D, autozygosity and Fst between saker and gyrfalcons: {pi} had higher values on smaller chromosomes and Fst had higher values on larger chromosomes. Recombination rate concealed other chromosome type effects on {pi} and Tajimas D but largely explained variation in Fst. We find 39 selective sweeps--some shared--across the falcons. However, only five candidate genes--mostly housekeeping genes--were implicated as targets of balancing selection across all falcons, with 4 of these shared between Hierofalco and three shared across all the falcons. Occurrence of selective sweeps and balancing selection were not enriched by chromosome type or in the context of chromosome fusions. Overall, our findings provide insights into divergence and adaptation in large falcons, and demonstrate an association of genomic architecture and chromosomal fusions with all population genomic indicators and metrics of differentiation between species. Significance StatementFalcons are culturally and commercially important birds that have undergone recent chromosomal rearrangement, providing a natural system for studies on chromosomal heterogeneities and evolution. By analyzing genomic variation across three large falcon species, we show that chromosome type and chromosomal fusions structure patterns of recombination, diversity, and divergence. Our findings highlight the importance of underlying genomic architecture to common forms of evolutionary inference and call attention to the role of chromosomal fusions in shaping falcon evolution.

12
Convergent responses to light stress in oligohymenophorean ciliates bearing green algal symbionts

Kelly, J. B.; Futterknecht, N.; Ernst, S.; Becks, L.

2026-05-27 genomics 10.64898/2026.05.24.727488 medRxiv
Top 0.4%
3.6%
Show abstract

Photosymbiosis has evolved multiple times independently in ciliates. However, these associations can be antagonized by shifts in environmental parameters that impose stress on the host, necessitating the evolution of mechanisms to contend with this stress and to control the symbiont population. To investigate whether convergent strategies have evolved among algae-bearing ciliates in the class Oligohymenophorea, we imposed light stress on three host species that represent at least two independent evolutionary origins of photosymbiosis and measured their cellular responses. Under high light, all three species experienced an initial drop in host cell density which recovered to levels commensurate with those under low-light conditions as they decreased their symbiont loads. We then performed a comparative transcriptomic study to investigate whether a core set of genes exists that is involved in this response. Thirty-one gene families possess differentially expressed transcripts across all three species that included the upregulation C1 and S28 class peptidases, genes involved in ROS mitigation, and a gene with potential involvement in mitochondrial remodeling associated with changes in algal symbiont load. We additionally found downregulation in Dicer, which could mitigate the processing of algal transcripts by the hosts RNAi machinery that are freed upon algal digestion, and downregulation of motor proteins that may reflect changes in the hosts swimming behaviors and transport of intracellular vesicles in response to light. The 31 gene families are present and widespread in non-symbiotic oligohymenophoreans, illustrating that a pre-existing genetic toolkit exists in this clade that helps explain how it is predisposed to evolving photosymbioses.

13
Interpreting GC content differences across populations at polymorphic sites

Chandra, S.; Gao, Z.

2026-05-18 evolutionary biology 10.64898/2026.05.16.725686 medRxiv
Top 0.5%
3.5%
Show abstract

Recent studies have reported consistent inter-population differences in GC content at polymorphic sites in multiple species, including humans. Specifically, populations that experienced recent bottlenecks exhibit lower average GC content (GC%) at common polymorphic sites compared to non-bottlenecked groups--an observation previously interpreted as indication of rapid evolution of base composition. In this study, we investigate the evolutionary and technical factors driving these patterns across humans, mice, maize, and silkworm. We find that GC% at polymorphic sites is highly sensitive to the allele frequency threshold applied. Relaxing this threshold reduces inter-population differences to negligible levels in humans and significantly attenuates similar signals in other species. We further observe substantial GC% variation across allele frequency bins, a pattern driven by the differential abundance of different mutation types. We demonstrate that these observations are collectively driven by an interaction between demographic history and a universal excess of strong-to-weak mutations relative to weak-to-strong mutations, which is counteracted by GC-biased gene conversion (gBGC) over long evolutionary timescales. Forward-in-time simulations with realistic parameters recapitulate observed patterns of GC% variation across both populations and allele frequency bins. Overall, our findings reveal that the base composition at polymorphic sites is strongly shaped by the interaction between demographic history, mutation bias, and gBGC, and does not represent stable, genome-wide trends. Consequently, inter-population differences in GC content--especially at common variants--should not be interpreted as evidence of ongoing divergence in base composition or shifts in mutation patterns.

14
The impact of long-read sequencing on fungal genome assemblies: progress and disparity

Kroll, E.; Zoclanclounon, Y. A. B.; Urban, M.; Hill, R.; Hammond-Kosack, K. E.

2026-05-14 genomics 10.64898/2026.05.12.724544 medRxiv
Top 0.6%
2.7%
Show abstract

Fungal genomics has expanded rapidly over the past 30 years, and recently the pace and breath has further quickened for many taxa, although many taxonomic gaps persist. With three decades of rapid growth, fungal genomics now merits a re-examination of its history, progress, and unresolved taxonomic gaps. Here, we review the development of fungal genomics from early efforts such as the Fungal Genome Initiative to current progress driven by third-generation long-read sequencing. We have compiled and summarised publicly available fungal genomes to highlight trends in assembly quality, adoption of long-read technologies, and taxonomic representation. Notably, substantial phylogenetic gaps remain, particularly outside Dikarya, and significant challenges persist for unculturable taxa. This review identifies priorities for the fungal community, including: (1) coordinated efforts to close major taxonomic gaps across the fungal tree of life; (2) improved repository metrics to facilitate identification of high-quality assemblies; and (3) improved and standardised genome annotation which is lacking for most assemblies. Together, these steps will support the development of reliable genomic resources that capture the full breadth of diversity across the fungal kingdom, generating foundational data for comparative genomics, evolutionary biology, functional studies, genetic studies and applied research.

15
Ancient persistence and recurrent emergence of structural variants across divergent Atlantic salmon lineages

Diblasi, C.; Kwak, J. S.; Manousi, D.; Arnyasi, M.; de Leon, A. V.-P.; Barson, N. J.; Saitou, M.

2026-05-28 genomics 10.64898/2026.05.25.727466 medRxiv
Top 0.6%
2.7%
Show abstract

Structural variants (SVs) are a major source of genomic diversity, yet the evolutionary origins of SVs shared across divergent populations remain difficult to resolve. Shared SVs may reflect ancient polymorphism, recurrent mutation, introgression, or subsequent lineage-specific frequency change, but the relative contribution of these processes often remains difficult to distinguish. Here, we investigated SV evolution across four Atlantic salmon (Salmo salar) lineages differing in geography, Europe versus North America, and domestication status, wild versus farmed. Using sensitive SV discovery, stringent genotyping, local PCA, haplotype-distance analyses, and forward simulations, we tested whether broadly shared SVs behave as a single class of variation or separate into distinct evolutionary categories. We generated a high-confidence SV map and found that SVs were enriched in repetitive regions, particularly segmental duplications and LTR retrotransposons, consistent with genome architecture shaping SV formation. Nearly half of high-confidence SVs were shared across all four lineages despite deep continental divergence, and simulations showed that this broad sharing is more consistent with ancient persistence than recurrent mutation alone. In contrast, a small subset of large SVs exhibited complex PCA clustering and multimodal haplotype-distance distributions, consistent with recurrent formation at structurally unstable loci. Large SVs also showed contrasting frequency trajectories between continents, and one immune gene-rich copy-number variable region showed a marked frequency increase in domesticated European salmon. Together, these results show that shared SVs comprise distinct evolutionary categories shaped by ancient persistence, recurrent mutation, and lineage-specific frequency change.

16
Expression-dependent but strand-independent synonymous single-nucleotide polymorphism in the Escherichia coli chromosome

Deka, N.; Beura, P. K.; Sen, P.; Aziz, R.; Kashyap, A.; Keot, D.; Jain, M.; Namsa, N. D.; Deka, R. C.; Feil, E.; Satapathy, S. S.; Ray, S. K.

2026-05-26 evolutionary biology 10.64898/2026.05.22.727198 medRxiv
Top 0.7%
2.2%
Show abstract

BackgroundMutation is thought to arise mainly during replication, though transcription is also known to be mutagenic. Considering the recent reports regarding genome-wide transcription-induced mutagenesis, a distinct demonstration of specific mutation being replication-dependent and/or transcription-dependent in genomes is yet to be established. Here, we studied synonymous single-nucleotide polymorphisms (SNPs) in 2091 individual coding sequences (CDS) in the leading strand (LeS) and the lagging strand (LaS) of the Escherichia coli chromosome by comparing across 157 strains. The frequencies of complementary transitions (ti) and complementary transversions (tv) were compared in each CDS to assess parity violation in the strands. ResultsThe C[->]T and G[->]A exhibited the maximum frequency as well as the most prominent strand inequality as these tis were influenced both by the strands as well as by the expression. Interestingly, inequality between T[->]C and A[->]G was expression-dependent but strand-independent. A[->]T and G[->]T tvs were universally more frequent than their complementary T[->]A and C[->]A tvs, respectively. ConclusionsOur study demonstrates strand-independent but expression-dependent synonymous SNP inequality in CDS, supporting the role of transcription-induced mutagenesis contributing to strand inequality in the E. coli chromosome.

17
Convergent gene erosion in the chemical defensome of marine mammals

Danneels, B.; Oliveira, D. O.; Castro, F. L. C.; Karlsen, O. A.; Ruivo, R.; Goksoyr, A.

2026-05-23 genomics 10.64898/2026.05.21.726804 medRxiv
Top 0.8%
2.0%
Show abstract

To preserve homeostasis in the face of continual chemical insult, animals evolved dedicated molecular systems that detect, detoxify, and eliminate foreign compounds. Collectively, these enzymes, transporters, and regulatory pathways constitute the chemical defensome. In cetaceans, the loss of two key nuclear receptors (NR1I2/PXR and NR1I3/CAR) suggests a profound rearrangement of the chemical defense systems. Therefore, we investigated the gene inventory of the chemical defensome in Cetacea and two other major marine mammal lineages (Pinnipedia and Sirenia), using their closest terrestrial relatives to understand the extent and patterns of chemical defensome remodelling. We demonstrate large-scale gene loss in chemical defensome genes of cetaceans, as well as smaller scale gene loss in the other two marine mammal lineages, indicating possible convergent evolution. Gene loss occurred predominantly in phase I and phase II biotransformation enzymes, including CYPs, FMOs, SULTs, and GSTs. Many of the lost genes in cetaceans are known to be regulated by PXR and/or CAR, while genes lost in multiple marine mammal lineages are often not regulated by these transcription factors. We hypothesize that the transition to aquatic environments, often accompanied by corresponding changes in feeding habits, led to convergent loss of chemical defensome genes, and loss of PXR and CAR in cetaceans accelerated these losses. These findings reveal systematic erosion of chemical defense capabilities across marine mammal lineages, suggesting that adaptation to marine life involves trade-offs in detoxification capacity that may have significant implications for these species responses to increasing chemical pollution in present-day ocean environments.

18
Coordinated Evolutionary Rates in Oxidative Phosphorylation Complexes of Papilionoid Legumes: Cytonuclear Coevolution and Relaxed Selection

Tressel, L.; Havird, J. C.; Choi, I.-S.; Ruhlman, T.; Cardoso, D.; Wojciechowski, M.; Jansen, R.

2026-05-26 evolutionary biology 10.64898/2026.05.22.727288 medRxiv
Top 0.9%
1.9%
Show abstract

Across eukaryotes, mitochondrial (mt) and nuclear genomes coordinate the expression and interaction of gene products to maintain cellular functions. While mitonuclear coevolution has been widely explored in animals, it remains understudied in plants, despite their utility as model systems due to relatively slow mitochondrial evolutionary rates and the presence of plastids. Plants rely on oxidative phosphorylation (OXPHOS) for ATP conversion, which requires cofunctionality and likely coevolution of mitochondrial and nuclear gene products. Here, we investigated evolutionary rate covariation (ERC) between mitochondrial- and nuclear-encoded OXPHOS genes in papilionoid legumes, where plastid-nuclear coevolution and an inversion in plastid DNA have been documented previously. Using 50 legume species spanning 15 papilionoid clades, we estimated evolutionary rates for five gene sets: mt-encoded OXPHOS genes, nuclear-encoded mitochondrial-targeted (N-mt) OXPHOS genes, and three control nuclear gene sets that lack mitochondrial interactions (glycolysis, cell cycle, and cytosolic ribosomal genes). Both mt and N-mt OXPHOS genes exhibited significantly elevated nonsynonymous (dN) and synonymous substitution rates (dS) in the 50-kb inversion clade relative to other legumes, suggesting accelerated mitochondrial substitution rates. Moreover, elevated dN/dS ratios in mt and N-mt OXPHOS genes in this clade were driven by relaxed purifying, not intensified positive selection. ERCs were highest for OXPHOS complexes and genes with physical mitonuclear interactions, as predicted under mitonuclear coevolution. We discuss how these results compare to other cases of cytonuclear coevolution in plants, including plastid-nuclear coevolution in papilionoids, and why dual-targeted, nuclear-encoded genes that repair mt and plastid DNA may underly patterns of molecular evolution in both organelles. Significance StatementMitonuclear interactions are essential for cellular energy production, yet the evolutionary dynamics of these interactions remain poorly understood in plants. This study highlights papilionoid legumes as an important system for understanding how coordinated evolution between mitochondrial and nuclear genes shapes plant genomes. By identifying signatures of mitonuclear coevolution in the 50-kb inversion clade, this work demonstrates how shifts in selective pressures on mitochondrial processes can influence nuclear gene evolution. These findings advance our understanding of cytonuclear coordination in plants and provide a foundation for future studies exploring how interactions among genomic compartments contribute to plant evolution, adaptation, and resilience in agriculturally important lineages.

19
Highly contiguous reference genome assembly of the endangered Orces blue whiptail Holcosus orcesi

Pozo, G.; Cisneros-Heredia, D. F.; Barragan-Orbe, D.; Sanchez-Nivicela, J. C.; Arbelaez, E.; Torres, M.

2026-05-16 genomics 10.64898/2026.05.14.725226 medRxiv
Top 0.9%
1.8%
Show abstract

Holcosus orcesi, the Orces Blue Whiptail, is a Critically Endangered lizard endemic to the upper Jubones River basin in southern Ecuador. Restricted to a narrow elevational range within semi-arid Andean shrublands, it represents one of the few montane members of a predominantly lowland lineage. Here we present the first high-quality reference genome for H. orcesi, generated using Oxford Nanopore Technologies long-read sequencing. The assembly spans 1.68 Gb across only 91 contigs, with an N50 of 76.2 Mb and a BUSCO completeness of 96.8%, making it among the most contiguous and complete squamate genomes to date. Structural annotation predicted 25,682 genes, of which 85% showed homology to known proteins and 45% were assigned Gene Ontology terms. Repetitive elements accounted for 46.3% of the genome, with LINEs representing the predominant class. This genome provides a foundational resource for future evolutionary, comparative and conservation-genomic research of H. orcesi and other mountain reptiles, enabling studies of population genomics, local adaptation, and genomic erosion in isolated populations. By expanding the genomic representation of tropical montane reptiles, this work helps address longstanding phylogenetic and geographic gaps in global biodiversity genomics and provides a foundation for evidence-based conservation of H. orcesi and related taxa.

20
Comparing fine-scale mutation and recombination landscapes in rhesus macaque (Macaca mulatta) populations of Chinese and Indian descent inferred from both short- and long-read sequencing data

Spatola, G. J.; Versoza, C. J.; Soni, V.; Heenkenda, E. J.; Jensen, J. D.; Pfeifer, S. P.

2026-05-26 evolutionary biology 10.64898/2026.05.26.727910 medRxiv
Top 0.9%
1.8%
Show abstract

Genomic diversity amongst primates is fundamentally shaped by species- and population-specific rates of mutation and recombination. In this study, we infer fine-scale mutation and recombination rate maps for the rhesus macaque (Macaca mulatta) -- the most widely used non-human primate model in biomedical research -- leveraging both short-(Illumina) and long-(PacBio HiFi) read sequencing data from two distinct populations of Chinese and Indian descent. Thereby, we draw comparisons between the rates estimated from each dataset, highlighting both biologically meaningful variation between these populations as well as artefactual discrepancies likely arising from systematic biases and differences in the utilized sequencing technologies. Consistent with previous observations in humans, broad-scale features of the recombination landscape are well-conserved between the two populations, but significant differences exist at the finer scales. Notably, we find evidence for a high rate of turnover in recombination hotspots over a short evolutionary time span, resulting in population-specific recombination maps in which the vast majority of the >30,000 identified recombination hotspots in one population are inactive in the other population. Given that mutation and recombination rates are necessary components for the interpretation of other diversity-shaping processes and events, including those characterizing both the underlying demographic and selective histories, the incorporation of these population-specific maps into future models will improve our understanding of the evolutionary genomics of the species. Additionally, these maps will serve as a fundamental component of future genome-wide association and fine-mapping studies of disease traits in this biomedical model system.