Back

Science

American Association for the Advancement of Science (AAAS)

Preprints posted in the last 90 days, ranked by how well they match Science's content profile, based on 429 papers previously published here. The average preprint has a 1.02% match score for this journal, so anything above that is already an above-average fit.

1
Misleading Success: Genomes Reveal Critical Risks to European Gray Wolves

Ravagni, S.; Battilani, D.; Salado, I.; Lobo, D.; Sarabia, C.; Leiva, C.; Caniglia, R.; Fabbri, E.; Ciucci, P.; Girardi, M.; Santos, F. I.; Kusak, J.; Mattucci, F.; Naderi, M.; Nowak, C.; Sekercioglu, C.; Skrbinsek, T.; Velli, E.; Stronen, A. V.; Vila, C.; Godinho, R.; Leonard, J.; Vernesi, C.

2026-03-23 evolutionary biology 10.64898/2026.03.20.713253 medRxiv
Top 0.1%
44.6%
Show abstract

Have European gray wolves recovered? Despite an increase to [~]21,000 wolves (Canis lupus), our genomic analyses reveal significant risks to their long-term viability. We analyzed over 200 whole-genomes spanning five major European populations. Rather than a single recovering population, European wolves form a mosaic of isolated, independently evolving lineages, mostly diverging in the late Pleistocene. All lineages have contemporary effective population sizes below the threshold for long-term viability (Ne [≥] 500) and show extensive inbreeding. Runs of homozygosity reveal population-specific inbreeding histories spanning recent to deep timeframes. Most lineages exhibit higher realized than masked genetic load, indicating emerging inbreeding depression. These findings challenge claims that downlisting European wolves is biologically warranted: none of these populations currently meets thresholds associated with favorable conservation status.

2
A deep learning predictor of bindable protein surfaces toguide generative synthetic biology

Almeida-Souza, L.

2026-04-16 synthetic biology 10.64898/2026.04.16.718848 medRxiv
Top 0.1%
41.1%
Show abstract

The advent of generative machine learning models has revolutionized de novo design of protein binders. However, the wide adoption of this revolution is bottlenecked by computational cost. For many targets, binder design commonly requires computationally intensive sampling across structures, often wasting days of GPU time on unwanted or geometrically inviable regions. Here, IARA (Interface Analysis and Recognition Architecture) is introduced, a deep learning Graph Neural Network designed as a rapid structural filter to triage protein binder generative pipelines. IARA is trained entirely on BindCraft trajectories generated against s RFdiffusion-generated targets. Based on a slim network with only seven residue features, IARA maps the binder designability of input proteins in a matter of seconds. On validation runs using BindCraft, RFdiffusion and BoltzGen, IARA successfully identified the optimal binding interface for practically all targets. By instantly pinpointing the highest-probability binding pockets, IARA democratizes synthetic biology, drastically reducing the exploratory GPU compute required for successful de novo binder generation.

3
A portable orthogonal replication system enables continuous gene evolution near the biological speed limit

Tian, R.; Rehm, F. B. H.; Kenneth, M.; Jamali, K.; Zhotev, P. S.; Liu, K. C.; Chin, J. W.

2026-02-24 synthetic biology 10.64898/2026.02.20.706958 medRxiv
Top 0.1%
40.2%
Show abstract

Orthogonal DNA replication systems uncouple the mutagenesis of target genes from host viability, enabling target gene hypermutation beyond the genomic critical error threshold and thereby unlocking access to greater sequence space for accelerated evolution. Here we introduce a series of upgrades to the E. coli orthogonal replication system, EcORep. We develop strategies to efficiently establish, engineer, and transform orthogonal replicons. We develop and utilize replicon-REXER to establish a 77 kb replicon, the largest orthogonal replicon reported to date. Directed evolution of the orthogonal DNA polymerase yielded variants with mutation rates of [~]10-4 substitutions per base per generation and best-in-class mutational spectra. These polymerases are three orders of magnitude more mutagenic than the first-generation EcORep system, enable mutagenesis at one million times the genomic levels, and straddle the evolutionary critical error threshold for the mutation of genes tested. Using the highly mutagenic EcORep system, we rapidly evolve an ethanol assimilation pathway for increased performance. Furthermore, we find that the three components sufficient to drive the minimal EcORep system enable O-replication systems to be established in other Gram-negative bacteria. Thus, we establish VinORep in Vibrio natriegens. VinORep combines O-replicon mutation, around the limit for molecular evolution of genes, with the fastest growing organism, to realize gene evolution approaching the biological speed limit. We exemplify the utility of this advance through the rapid evolution of new function - via the accumulation of tens of mutations, and selection - in 16 hours.

4
Migration genetics link excitatory neurons to ancient selection and economic growth

Casten, L. G.; Tener, A.; Elsadany, M.; Yang, J. S.; Strang, J. F.; Michaelson, J.

2026-02-06 genomics 10.64898/2026.02.05.703995 medRxiv
Top 0.1%
39.8%
Show abstract

From ancient nomadic movements to modern urbanization, migration has driven human history through mechanisms that remain poorly understood. We conducted a genome-wide association study of migration distance in 250,000 UK individuals, identifying 20 loci in neurodevelopmental genes with 5% heritability. Migration-related variants are associated with excitatory neuron gene expression and correlate with cognition, risk-tolerance, and reduced interpersonal attachment. Within-family analyses demonstrate genetic effects remain significant after controlling for shared environmental factors between siblings. Our polygenic score predicts inferred mobility in >1,000 ancient individuals spanning 10,000 years, revealing positive selection on migration alleles that increased substantially over millennia. At the population level, each standard deviation increase in county-level migration polygenic score predicts >$4,000 greater income growth per person in the US. These findings establish migration as a heritable trait and suggest biological pathways connecting individual neurodevelopment with regional prosperity across evolutionary and contemporary timescales.

5
Evolutionary instability drives structural diversity and disease susceptibility at the 16p12.2 locus

Smolen, C.; Girirajan, S.

2026-03-05 genomics 10.64898/2026.03.04.709583 medRxiv
Top 0.1%
39.4%
Show abstract

Extensive duplication in the African great ape lineage has led to substantial instability of chromosome 16p. We examined the sequence structure and evolutionary history of the 16p12.2 locus in 570 diverse human haplotypes and seven non-human primates. Human haplotypes vary greatly in size and exhibit ancestry-biased structure. We identify 5-14 clusters of distinct architecture at three segmental duplication (SD) blocks, generating 21 unique haplotype configurations. Two duplicons within these SDs, D5 and D6, mediate the neurodevelopmental disorder-associated 16p12.1 deletion; however, exact breakpoint positions and local sequence architecture vary across families. The region has toggled between orientations over the past 25 million years, and we identify 32 inversions in humans mediated by distinct duplicons. Evolutionary analyses reveal incomplete lineage sorting, interlocus gene conversion, and lineage-specific expansions, including human-specific expansions of D5 and D6. These findings highlight the evolutionary instability at 16p12.2 driving structural diversity and deletion susceptibility in humans.

6
Gothic Identity as Cultural Practice: Paleogenomic Evidence for Multi - Ethnic Assemblages Under Gothic Material Culture in Late Antique Bulgaria (4th - 6th centuries CE)

Stamov, S.; Chobanov, T.; Wang, T.; Stoeva, K.; Momchilov, D.; Aladzhov, A.; Chobanov, K.; Nikolov, M.; Nesheva, D.; Heather, P.; Toncheva, D. I.; Zamfirov, M.; Lazaridis, I.; Reich, D. E.

2026-03-05 genomics 10.64898/2026.03.03.709317 medRxiv
Top 0.1%
39.0%
Show abstract

Ethnonyms such as "Goth" in Late Antique sources capture political and cultural affiliations that may not map cleanly onto biological descent. Here we report genome - wide ancient DNA from 38 individuals associated with Gothic - period mortuary contexts at two sites in present - day Bulgaria: the Aquae Calidae necropolis ([~]320 - 375 CE) and the Aul of Khan Omurtag necropolis ([~]350 - 489 CE). Using PCA, f - statistics, qpAdm, uniparental markers, and IBD/kinship analyses, we find: (i) strong within - site heterogeneity, rejecting a single "Gothic" genetic profile; (ii) a reproducible north - south genetic contrast, with Aquae Calidae individuals shifted toward a Balkan/Anatolian - related ancestry axis and AKO individuals enriched in northern European - related ancestry consistent with Wielbark/Chernyakhiv proxies; and (iii) admixture dating with DATES placing the mixing between northern and southern ancestry poles at [~]11 - 13 generations before burial (point estimates in the 1st century CE, depending on target grouping), based on 23 individuals with sufficient coverage. Together, these results support models in which Gothic material culture in the Balkans was practiced by multi - ethnic communities and illustrate how cultural "Gothic" identity could persist despite substantial genetic diversity. Full f3/qpAdm/DATES outputs, f4 validation, and kinship/IBD summaries are provided in Supplementary Tables S1-S6, Supplementary Notes S2-S4, and the Supplementary IBD Workbook.

7
Pleistocene climatic oscillations impact the diversification of deer mice (Peromyscus maniculatus) and the independent evolution of ecotypes

Boria, R. A.; Wooldridge, B.; Kautt, A. F.; Ashing-Giwa, K. F.; McFadden, S. P.; Kirby, C.; Edwards, S. V.; Hoekstra, H. E.

2026-01-27 evolutionary biology 10.64898/2026.01.26.699144 medRxiv
Top 0.1%
38.7%
Show abstract

A central question in evolutionary biology is whether local adaptation is predictable when a species repeatedly encounters similar environments. The deer mouse, Peromyscus maniculatus, has a range of over 13 million km2 in North America and may be found in nearly every terrestrial habitat. Because of their abundance and wide habitat preference, deer mice and closely related Peromyscus, which we refer to as the P. maniculatus species complex, are at the forefront of studies of biogeography and local adaptation. Here, we undertake a comprehensive survey of genome-wide and phenotypic diversity to characterize the recent evolutionary history of this group. We sequenced whole genomes from 232 individuals across their range, representing the most thorough genetic sampling of the P. maniculatus species complex to date. We identify six geographically delineated clades, several of which encompass both classically recognized P. maniculatus subspecies as well as other recognized species. Ecological niche modelling suggests that this geographic structure resulted from rapid post-LGM range expansion and adaptation to emerging habitats. Our morphological measurements of 979 specimens and field data compiled from over 28,000 museum records show that deer mice in forests across the range consistently have longer tails, larger feet, bigger ears, and elongated whiskers. These traits constitute an arboreal ecotype that has evolved at least three times independently, and was likely lost in other parts of the range as populations moved out of forested habitat. Altogether, these results suggest that post-LGM increases in forested habitat drove the parallel evolution of arboreal ecotypes across the deer mouse range.

8
Systematic inference of mutation rates and spectra across the tree of life via a scalable read-based framework

Pinhasi, A.; Yizhak, K.; Maruvka, Y. E.

2026-02-04 evolutionary biology 10.64898/2026.02.02.703326 medRxiv
Top 0.1%
37.0%
Show abstract

The rapid increase in available genome assemblies allows eukaryote-wide analyses of mutation rates and mutational spectra, yet whole-genome alignment remains a major computational bottleneck. We present CORAL, a scalable framework for inferring branch-specific substitutions without a centralized whole-genome alignment. CORAL fragments sister genomes into pseudo-reads, aligns them to an outgroup, and assigns substitutions by parsimony. CORAL achieved high concordance with three independent resources for both mutation rates and 96-category spectra. Applying CORAL to 5,090 species with calibrated divergence times, we generated the largest comparative atlas of mutation rates and spectra across animals, plants, fungi, and protists. Mutation rates vary by orders of magnitude and correlate with life-history traits such as lifespan and body weight. We find that mutation spectra are major determinants of each clades genomic trinucleotide composition and exhibit strong phylogenetic structure. We identified seven evolutionary mutational signatures, including two novel signatures and three previously observed only in cancer. Signature activities varied widely, and for several processes, tracked life-history covariates, suggesting distinct etiologies. Together, CORAL and this extensive atlas establish a powerful framework for comparative genomics, overcoming alignment bottlenecks to reveal the forces driving molecular evolution.

9
High Diversity Gene Libraries Facilitate Machine Learning Guided Exploration of Fluorescent Protein Sequence Space

Benabbas, A.; Kearns, P.; Billo, A.; Chisholm, L. O.; Plesa, C.

2026-03-02 synthetic biology 10.64898/2026.03.01.706892 medRxiv
Top 0.1%
36.6%
Show abstract

While protein language models (PLMs) have shown great promise for protein design, their performance is fundamentally constrained by the diversity and completeness of available training data. In particular, PLMs often struggle to extrapolate to sequences that fall outside the distribution spanned by their training sets, limiting their ability to discover proteins in sparsely sampled regions of sequence space. Here we test the hypothesis that experimentally expanding training diversity can convert extrapolation into interpolation and thereby enable discovery of functional sequences beyond natural protein manifolds. Using large-scale gene synthesis and DNA shuffling, we generate libraries that span a broad region of fluorescent protein sequence space and create chimeric variants that bridge between distant homologs. Functional screening for blue fluorescence yields thousands of active variants distributed across diverse sequence lineages. Fine-tuning ProtGPT2 on this expanded dataset enables generation of diverse fluorescent proteins, including designs that extend beyond the regions occupied by known natural sequences while retaining function. This work illustrates how synthetic approaches can help address key limitations in machine learning-guided protein design, especially for small or sparsely populated protein families, by actively creating novel sequences across unexplored but functional regions of sequence space.

10
Evolution as Active Geometry: The Geometric State Equation of the Tree of Life

Fenn, R.; Fenn, A.

2026-03-13 evolutionary biology 10.64898/2026.03.09.710612 medRxiv
Top 0.1%
35.9%
Show abstract

Any process that generates information at a constant rate into a branching hierarchy faces a geometric packing problem: the number of distinguishable lineages grows exponentially, but Euclidean space grows only polynomially. We show that this tension forces a unique resolution. By deriving a geometric state equation from three physical postulates--information flux, hierarchical topology, and geometric fidelity--we prove that any such system must embed into a hyperbolic manifold of curvature{kappa} = (h ln 2/(n - 1))2, where h is the entropy rate and n the embedding dimension. The equation has zero adjustable parameters, a unique positive solution, and a globally stable equilibrium. For the tree of life, back-solving across all systems tested--from decade-old viral outbreaks to 3.8-billion-year cellular lineages--yields a universal embedding dimension of n = 2.00 {+/-} 0.05 despite orders-of-magnitude variation in mutation rate and timescale. This topological invariant, combined with the effective entropy of the genetic code (h {approx} 1.61 bits), predicts a curvature of{kappa} = 1.245. Five independent neural networks trained on 5,550 genomes from all domains of life, receiving no phylogenetic supervision, converge to{kappa} = 1.247 {+/-} 0.003 (CV = 0.24%), confirming the prediction within 0.2%. Independent validation across 15 viral families spanning 101-108 years of divergence yields Pearson r = 0.996 between predicted and measured curvatures. Extending the test to the 20-letter amino acid alphabet, we embed 15 protein family phylogenies into [H]2 and measure{kappa} protein = 3.80 {+/-} 0.60, confirming the predicted 3.1x curvature increase ({kappa} = 3.90) to within 2.6%, while recovering n = 2.03 {+/-} 0.10 across alphabets. The curvature of the tree of life is not a historical accident but a geometric constraint imposed by the information capacity of the genetic code. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=174 HEIGHT=200 SRC="FIGDIR/small/710612v1_ufig1.gif" ALT="Figure 1"> View larger version (40K): org.highwire.dtl.DTLVardef@5e586eorg.highwire.dtl.DTLVardef@1ffaf62org.highwire.dtl.DTLVardef@1537c2borg.highwire.dtl.DTLVardef@1fcf0dc_HPS_FORMAT_FIGEXP M_FIG C_FIG The tree of life embeds optimally into 2D hyperbolic space with curvature{kappa} = 1.247 {+/-} 0.003, matching the prediction{kappa} = (h ln 2)2 = 1.245 from the geometric state equation to within 0.2%. Top: Voronoi tessellation of 5,550 genome embeddings in the Poincare disk, colored by domain (Bacteria, Archaea, Eukarya). LUCA occupies the center; cell boundaries are hyperbolic geodesics (circular arcs orthogonal to the disk boundary). Bottom: Five independent neural networks converge to the same curvature (CV = 0.24%), the state equation predicts curvature across both DNA and protein alphabets (3.1x curvature increase) with zero adjustable parameters, and cross-system validation confirms the curvature-entropy relationship (r = 0.996).

11
Phylogenomics and the origins of sharks

Brownstein, C.; Near, T. J.

2026-02-15 evolutionary biology 10.64898/2026.02.13.705779 medRxiv
Top 0.1%
33.7%
Show abstract

Genomes have the capacity to drastically modify hypotheses about the relationships of species. Despite the growing availability of non-model organism genome sequences, historically contentious portions of Tree of Life remain untested using genomic data. Here, we infer the phylogeny of sharks, skates, rays, and chimaeras using the genomes of 48 species, targeting different genomic marker types. Although phylogenetic relationships of chondrichthyans are relatively consistent across analyses, different molecular markers yield conflicting results about shark monophyly. Exons support the traditional view that sharks are monophyletic, whereas ultraconserved elements and legacy nuclear markers instead suggest that the frilled and cow sharks (Hexanchiformes), which retain the ancestral jaw structure of cartilaginous fishes, is the sister lineage of all other sharks and rays. The resolution of sharks as monophyletic or paraphyletic has little effect on inferences of the timescale of shark evolution or the origins of key traits, such as their ancestral ecology and genome size. We tie the diversification of living cartilaginous fishes to the transformation of marine ecosystems during the middle Mesozoic Era, confirming that living shark diversity is the product of rapid ancient diversification. Consequently, our results suggest that despite uncertainty around whether sharks are monophyletic, consensus can still be reached about major evolutionary events in this iconic vertebrate lineage. Significance StatementLiving sharks, skates, rays, and chimaeras form one of the three principal groups of vertebrates. These iconic animals, which include over 1200 species, are key components of marine ecosystems and have helped us reconstruct the evolution of vertebrate genomes and phenotypes. However, much work on this group has assumed that sharks are a natural group. Here, for the first time, we leverage genome-scale data to test this hypothesis. Surprisingly, we show different genome regions reject or support hypothesis that sharks form a natural group to the exclusion of skates and rays. This throws an unexpected wrench into our understanding of the relationships of some of the oldest living vertebrate clades.

12
Global comparison of influenza A and B epidemiology identifies consistent geographic and socio-demographic predictors

Gunning, C. E.; Rezaeimalek, S.; Rohani, P.

2026-03-16 epidemiology 10.64898/2026.03.14.26348363 medRxiv
Top 0.1%
32.7%
Show abstract

Seasonal influenza outbreaks are caused by types A and B that together account for an estimated 3-5 million severe cases each year. Most attention has focused on influenza A viruses (IAVs) due to their rapid evolutionary dynamics and high disease burden, and has been concentrated in well-observed high-income regions. Here, we use a macroecological approach to compare and contrast the global epidemiology of IAVs and influenza B viruses (IBVs) across 111 countries and 15 influenza seasons (2010-2024). We first show how temporal correlations between countries depends on both distance and geographic region. For both IAV and IBV, we find high overall synchrony among northern temperate countries, whereas tropical countries display marked heterogeneity. At the longer time scale of influenza seasons, we next quantify sampling intensity, positivity, seasonality, fade-out dynamics and the timing and variability of epidemic peaks. We then describe how these long-term epidemiological outcomes change in association with a suite of 17 geographic, climatic, and socio-economic variables. In addition, we document persistent surveillance gaps, particularly in Africa, and highlight ongoing but spatially variable impacts of the SARS-CoV-2 pandemic-era on sampling. Overall, we find strong correspondence between the macroscopic features of IAV and IBV epidemiology, with critical roles played by geography and climate (especially latitude and temperature), economics (per capita GDP) and demographics (population size and per capita birth rate). Significance StatementThe global circulation of seasonal influenza A and B viruses (IAV and IBV) imposes major human health impacts each year that very widely across space and time. An improved understanding of these dynamics could improve public health preparedness, response, and intervention efforts. Here we offer a comprehensive comparison of IAV and IBV dynamics across 15 seasons, 111 countries, and six continents. We demonstrate the impact of distance and region on temporal correlation, quantify how measures of influenza seasonality change with geographic and socioeconomic factors, and predict how frequently influenza cases are absent from countries. Our study finds widespread similarities between IAV and IBV (along with key differences), documents notable geographic clusters of countries with shared dynamics, and highlights persistent gaps in global influenza surveillance.

13
The historical domestication of a Clostridium botulinum strain used for the industrial production of botulinum neurotoxin

Keim, P.; Nottingham, R.; Guevara, M. A.; Miller, E. F.; Vogler, A. J.; Williamson, C. H. D.; Smith, T.; Posner, R. G.; Pellett, S.; Lenski, R. E.; Sahl, J.

2026-02-26 microbiology 10.64898/2026.02.26.708219 medRxiv
Top 0.1%
32.5%
Show abstract

Laboratory production of botulinum neurotoxin (BoNT) began more than 90 years ago for medical, military, and later pharmaceutical applications, creating one of the longest-running examples of microbial cultivation under sustained human control. A single Clostridium botulinum Group I lineage, Army Hall A (AHA), was established by the U.S. military in 1942 and later gave rise to Hall A-hyper (HAH), the strain widely used for pharmaceutical BoNT production. We analyzed more than 1,000 C. botulinum genomes to identify AHAs closest relatives and to infer the most recent common ancestor (MRCA) shared with its nearest wild lineage. Relative to this MRCA, AHA accumulated nearly 100 genetic changes, including 46 single-nucleotide substitutions, 44 small insertions or deletions, and several large deletions and structural variants that led to the loss of more than 80 genes. A nonsense mutation in mutS generated a hypermutator phenotype that accelerated mutation rates and increased genetic diversity. This event occurred early in the lineages domestication and appears to have facilitated laboratory-adaptive traits, including the lack of sporulation and increased BoNT yield. Competition experiments under standard growth conditions confirmed substantial laboratory adaptation, with AHA exhibiting a strong fitness advantage over a close wild relative. Together, these results reconstruct the genomic trajectory of a bacterium evolving under prolonged human-mediated selection and provide a genome-resolved example of microbial domestication. The findings show how laboratory conditions, industrial selection, and changing mutation rates can jointly shape bacterial evolution, and they offer a general framework for understanding domestication and laboratory adaptation across microbial systems. SignificanceHuman-mediated species domestication has been central to the development of civilization, with well-known examples in plants and animals. Microbes have also been domesticated, often unintentionally and before the advent of modern microbiology. The military and pharmaceutical applications of botulinum neurotoxin led to the sustained cultivation and eventual domestication of a high toxin-producing Clostridium botulinum strain that remains widely used today. We reconstruct the genomic changes along this trajectory and identify a key early mutation that generated a hypermutator phenotype, increasing genetic variation available for human-directed selection. These findings reveal fundamental evolutionary processes shaping bacterial genomes under long-term human control. This genome-resolved example of microbial domestication offers a general framework for understanding laboratory adaptation in both evolutionary and applied microbiology.

14
Asymmetric gene flow across a desert contact zone in a riparian songbird

Gyllenhaal, E. F.; Johnson, A. B.; Klicka, L. B.; Bauernfeind, S. M.; Baumann, M. J.; Brady, M. L.; Burns, K. J.; Witt, C. C.; Andersen, M. J.

2026-02-06 evolutionary biology 10.64898/2026.02.04.703846 medRxiv
Top 0.1%
32.4%
Show abstract

Secondary contact is a key point in the speciation process, and fine-scale geography can shape its outcomes. This is especially true for species restricted to fragmented habitats, such as riparian corridors through arid regions. Here we examine the role of disjunct riparian habitat in shaping secondary contact in Bells Vireo (Vireo bellii), a North American songbird species that contains distinct eastern and western forms. We recovered a unique, discontinuous contact zone along the Rio Grande in New Mexico, where two populations with greater nuclear and mitochondrial genetic affinity for the eastern lineage are separated by a population with an affinity for the western lineage. This point of primarily western ancestry on the Rio Grande corresponded with a stretch where several intermittently flowing tributaries join from the west and may have acted as gene flow corridors. Using a combination of empirical analyses of divergence and diversity across the genome and population genetic simulations, we uncovered evidence of neutral, genome-wide admixture driving the genomic architecture of divergence, rather than evidence for local adaptation or selective sweeps. In sum, this genomic study showed us how fine-scale dispersal corridors can cause idiosyncratic patterns of admixture when habitat is limiting in zones of secondary contact.

15
A haplotype-resolved bluethroat (Luscinia s. svecica) genome assembly uncovers the complex MHC region

Strand, M. A.; Enevoldsen, E. L. G.; Toerresen, O. K.; Skage, M.; Ferrari, G.; Tooming-Klunderud, A.; Leder, E. H.; Lifjeld, J. T.; Johnsen, A.; Jakobsen, K. S.

2026-03-30 genomics 10.64898/2026.03.26.714473 medRxiv
Top 0.1%
32.2%
Show abstract

We describe a chromosome-level, haplotype-resolved genome assembly from a female bluethroat (Luscinia s. svecica). The assembly comprises two pseudo-haplotypes of 1461 Mb and 1171 Mb, with 77.4% and 88.4% scaffolded into 40 autosomal chromosomes and the W and Z sex chromosomes (haplotype one). Assembly completeness is high (BUSCO 99.2% and 94.9%), with 22,462 and 18,769 annotated protein-coding genes for haplotypes one and two, respectively. The use of Oxford Nanopore Technologies sequencing enables resolution of genomic regions that are often fragmented in genome assemblies, including the hypervariable Major Histocompatibility Complex (MHC). We find that MHC loci include both the canonical organization of tandemly duplicated MHCII{beta} genes with a single MHCIIA, and a distinct arrangement in which MHCI and MHCII{beta} loci are interspersed in intermixed arrays, and that substantial structural differences between haplotypes are directly resolved in the assembly.

16
The Neanderthal population history and the introgression landscape inferred from the UK Biobank

Morez Jacobs, A.; Soltantouyeh, A.; Zeloni, R.; Carollo, F.; Mezzavilla, M.; Marnetto, D.; Pagani, L.

2026-04-04 evolutionary biology 10.64898/2026.04.03.716297 medRxiv
Top 0.1%
32.2%
Show abstract

Neanderthal haplotypes in present-day Eurasians are unevenly distributed across the genome, forming introgression deserts and high-frequency segments consistent with adaptive introgression, with additional random variation affected by genetic drift. However, current estimates are limited by modest sample sizes and analyses restricted to subsets of the genome, given that any individual carries only 1-2% Neanderthal ancestry. Here we extract and analyse Neanderthal haplotypes from 45,000 imputed and phased genomes in the UK Biobank. Even at this scale, the number of sites overlapping Neanderthal haplotypes approaches--but does not reach--saturation, with rare haplotypes still being discovered. Using the derived allele frequency spectrum within the surviving Neanderthal segments, we infer a divergence time of 2,061 generations between the introgressed lineage and the Vindija Neanderthal, and estimate the effective population size of the introgressed lineage to Ne = 6,564. Individual-level resolution allows identification of 545 independent loci with excess Neanderthal homozygosity, consistent with ongoing selection. Despite the extensive dataset, a substantial portion of the genome remains a Neanderthal desert. Within these regions, we detect seven Human Accelerated Regions affected by recent human selective sweeps (TMRCA <650 kya), four located within introns of cerebellum-expressed genes, providing further support for their potential as modern human-specific adaptation.

17
Scaling laws of genome composition and the transitionto complex multicellularity

de la Fuente, R.; Diaz-Villanueva, W.; Arnau, V.; Moya, A.

2026-03-03 genomics 10.64898/2026.03.02.708964 medRxiv
Top 0.1%
32.1%
Show abstract

Genome architecture reorganizes over evolutionary time to support complex multicellularity without a proportional expansion of coding DNA. We conducted a cross-kingdom comparative analysis using high-quality RefSeq assemblies annotated by the NCBI Genome Annotation Pipeline, restricting the dataset to chromosome-level or complete genomes. Scaling relationships among genome size, gene content, and coding DNA content reveal compositional transitions that distinguish prokaryotic, unicellular eukaryotic, and multicellular lineages. Beyond [~]40 Mb of genic content, coding expansion slows and saturates, indicating compositional constraints that shaped the rise of multicellularity. These results establish scaling laws that quantify how noncoding sequence expansion dominates genome growth in complex eukaryotes.

18
The Landscape of Stop Codon-Free Regions in Primates: A Reservoir of Proto-Genes

Soman, A. S.; Shreyasree, G.; Dwivedi, A.; Pramod, G. S.; Sakarkar, C.; Bhattacharya, D.; Vijay, N.

2026-03-02 genomics 10.64898/2026.02.27.708503 medRxiv
Top 0.1%
32.0%
Show abstract

Gene duplication has long been viewed as the primary source of new genes, yet growing evidence suggests that de novo emergence from non-coding DNA may be more common than previously assumed, requiring unbiased genome-wide strategies to identify its structural precursors. New protein-coding genes can arise from non-coding DNA, but the sequence features enabling this transition remain unclear. Here, we systematically identify and characterise stop-codon-free regions (SCFRs) across telomere-to-telomere assemblies of human and six other primates. Short SCFRs are abundant and widely distributed, whereas long SCFRs are rare and increasingly associated with coding overlap, moderate GC enrichment, and structured exon-intron contexts. We define exon shadows as in-frame SCFR extensions beyond annotated exon boundaries that lack stop codons, revealing latent coding-compatible sequence adjacent to established exons. We also detect introns fully spanned by single SCFRs, consistent with exitron-like architectures. Repeat composition, codon usage, and Fourier spectral analyses show that length filtering enriches for gene-like features and identifies a subset of long SCFRs with codon-scale periodicity. Together, these findings provide a framework for identifying extended ORF-like regions that may serve as substrates for de novo gene emergence in primates.

19
Convergent natural selection at both ends of Eurasia during parallel radical lifestyle shifts in the last ten millennia

Barton, A. R.; Rohland, N.; Mallick, S.; Pinhasi, R.; Akbari, A.; Reich, D.

2026-04-04 evolutionary biology 10.64898/2026.04.03.716344 medRxiv
Top 0.1%
31.8%
Show abstract

Ancient DNA-based studies of natural selection have focused on West Eurasia due to the availability of large sample sizes, but rich insights are expected to come from comparative studies that can reveal which patterns are shared and which region-specific. We test around seven million variants for selection in 1,862 ancient East Eurasians (867 with new data) distributed over the last ten millennia. Using a generalized linear mixed model to control for population structure, we identify 40 genome-wide significant signals of selection, which have a particularly strong impact on immune and cardiometabolic traits just as in West Eurasia. East and West Eurasia show highly correlated signals of adaptation both for individual alleles and for complex traits, showing how these geographically separate groups experienced convergent evolution in response to parallel transitions to food producing economies and the accompanying lifestyle changes. An exception is the genetic determinants of light skin color: West Eurasians depigmented in the last 10,000 years, but most skin lightening in East Asians arose prior to the Holocene.

20
Coenzyme A is bound to tafazzin - a paradigm change for transacylation

Rosas Jimenez, J. G.; Schiller, J.; Vonck, J.; Hummer, G.; Zickermann, V.

2026-02-07 biochemistry 10.64898/2026.02.05.703992 medRxiv
Top 0.1%
31.8%
Show abstract

Cardiolipin (CL) is the signature phospholipid of mitochondria. In an obligatory remodeling process, the mitochondrial transacylase tafazzin exchanges its acyl chains to create the highly unsaturated, mature form of CL. Tafazzin dysfunction causes Barth syndrome, a severe multisystem disorder. We determined the structure of tafazzin at a resolution of 2.2 [A] using cryo-electron microscopy (cryo-EM). Until now, the tafazzin reaction has been thought to be independent of coenzyme A (CoA). However, our structure clearly shows an acyl-CoA molecule bound to tafazzin. To decipher how substrates bind to the active site, we combine cryo-EM with structure predictions and molecular dynamics simulations, giving us detailed insights into a transacylation mechanism mediated by CoA. By providing molecular explanations of tafazzin dysfunction caused by pathogenic mutations, we gain a molecular understanding of Barth syndrome.