Back

Science

American Association for the Advancement of Science (AAAS)

Preprints posted in the last 30 days, ranked by how well they match Science's content profile, based on 429 papers previously published here. The average preprint has a 1.05% match score for this journal, so anything above that is already an above-average fit.

1
Genomic surveillance of a deeply sampled local population reveals age-specific drivers of RSV transmission

Kwon, J.; de Vries, E. M.; Lemey, P.; Li, K.; Breban, M.; Laing, K.; Ferguson, D.; Schulz, W. L.; Oliveira, C. R.; Bont, L. J.; Pitzer, V. E.; Weinberger, D. M.; Grubaugh, N. D.; Hill, V.; Redmond, S.

2026-05-18 epidemiology 10.64898/2026.05.07.26350887 medRxiv
Top 0.1%
37.3%
Show abstract

Respiratory syncytial virus (RSV) disproportionately causes severe infections among infants and older adults, yet the key age group responsible for viral spread to other age groups remains poorly defined. While current immunization approaches effectively reduce disease severity among the most vulnerable, identifying the core drivers of infection is essential to effectively disrupt population-level transmission. By generating 910 whole-genome viral sequences of RSV from all age groups (<1 to 65+ years) in Connecticut, we identified that children aged 12-35 months are the primary drivers of viral transmission to other age groups. This group significantly shapes the genetic diversity of circulating strains. Furthermore, we found that RSV is introduced into the community through frequent and independent entries from other US regions throughout the year, rather than through a single explosive seasonal introduction or long-term local persistence. Ultimately, our findings justify prevention strategies that expand beyond reducing disease burden to actively prioritizing the reduction of transmission and infection.

2
Whole-genome phylogenomics and synteny resolve a single origin of body-plan asymmetry in flatfishes

Gallego-Garcia, J.; Hays, D.; Tongboonkua, P.; Minich, J. J.; Hilgers, L.; Michael, T. P.; Hiller, M.; Zhang, C.; Orti, G.; Arcila, D.; Pfeiffer, W.; Duarte-Ribeiro, E.; Mirarab, S.; Betancur-R., R.

2026-05-26 evolutionary biology 10.64898/2026.05.25.727411 medRxiv
Top 0.1%
36.8%
Show abstract

Flatfishes display the most dramatic asymmetric body plan in vertebrates, yet whether this rare innovation evolved once (flatfish monophyly, FM) or multiple times (flatfish polyphyly, FP) has remained contentious. A recent genome-wide study supported FP by placing Psettodes, the earliest-diverging flatfish lineage, among symmetric relatives within Carangaria, the clade that also includes billfishes, jacks, mahi-mahi, and barracudas. Subsequent work traced this to base-composition artifacts and inadequate substitution modeling. Here we revisit the question using whole-genome phylogenomic and synteny data from 17 carangarian species spanning flatfishes and carangarian relatives. We contribute three new chromosome-level assemblies, including the first for Psettodes. Nucleotide-based coalescent analyses (e.g., ROADIES, CASTER) yield strong support for FM, with Psettodes sister to all other flatfishes. Microsynteny analyses built from conserved gene-order blocks corroborate this result: topology tests, cluster-profile counts, and rearrangement-based trees favor FM over two competing FP topologies. Macrosynteny, based on chromosome-scale rearrangements, yields a more mixed signal, with support for FM depending on the metric and taxon-sampling scheme. We interpret this scale-dependent pattern in the context of the explosive post-Cretaceous radiation of Carangaria. The short intervals between speciation events that characterize rapid radiations appear to have left sufficient signal in fine-grained microsyntenic rearrangements, while chromosome-scale rearrangements were too rare to consistently resolve these closely spaced splits. When integrated with evidence from conserved developmental mechanisms active during metamorphosis, the stage at which flatfish asymmetry first emerges, and from the exceptionally complete fossil record, our multi-scale genomic evidence supports a single evolutionary origin of flatfish asymmetry.

3
Integrated surveillance resolves Darien paradox of Oropouche virus emergence in Panama migration corridor

Rodriguez, X.; Perez-Jimenez, J. G.; Alexander, L. W.; Lezcano-Coba, C.; Galue, J.; Juarez, Y.; Beltran, D.; Smith, D. R.; Kadir, M.; Ali, D. W.; Corrales, R.; Trujillo Rodriguez, L.; Valdiviezo, G. E.; Thomas, Q. K.; Cicalo, A.; Fitzpatrick, M. C.; Luquette, A. E.; Cameron Sayer, L.; Cer, R. Z.; Malagon, F.; Grajales, I. A.; Rivera, L. F.; Gonzalez-R, Z.; Antioco, J.; Walters-Valdes, E.; Meneghello-Ponce, N.; Vittor, A. Y.; Escobar-Lee, K.; Abouganem-Shaw, A.; Rodriguez, F.; Aguirre, E.; Loyola, S.; Tinoco, Y.; Moreno, B.; Chen-German, M.; Ampuero, S.; Gomez-Angelo, A.; Correa-Duarte, S.; Ace

2026-06-01 epidemiology 10.64898/2026.05.28.26354376 medRxiv
Top 0.1%
36.4%
Show abstract

Oropouche virus (OROV) spread across the Americas in 2024, yet Panama Darien migration corridor saw no outbreak until nearly a year after Brazil January 2024 peak, raising two hypotheses: cryptic circulation masked by diagnostic gaps, or recent introduction under permissive climatic conditions. Here we resolve this paradox using integrated clinical, genomic, and climate-informed surveillance. Among 1,040 individuals tested, 43% were OROV-positive and showed a clinical signature distinct from co-circulating arboviruses, including headache more frequent than in dengue (RR 2.38, 95% CI 1.74-3.24). The household secondary attack rate was 56%, and waste burning independently predicted infection. Phylogeographic reconstruction identified a single recent introduction in October 2024 with no evidence of adaptive evolution, excluding prolonged cryptic persistence. Climate-informed models indicate broad outbreak susceptibility across Panama, with Bocas del Toro and Los Santos as the next highest-risk provinces. These findings identify a Central American foothold for OROV with potential for further northward spread.

4
From receptor binding to biogeography: Multi-scale prediction of filovirus hosts in bats

Castellanos, A. A.; Anthony, S. J.; Chandran, K.; Lasso, G.; Wells, H. L.; Han, B. A.

2026-05-19 ecology 10.64898/2026.05.18.726005 medRxiv
Top 0.1%
36.3%
Show abstract

Forecasting zoonotic risk requires identifying which host species are biologically susceptible to infection, yet susceptibility is rarely predicted using frameworks that integrate molecular mechanisms with macroecology. Filoviruses, a diverse group of bat-associated viruses that include Ebola and Marburg viruses, illustrate this challenge: viral entry depends on interactions between viral glycoproteins and the host receptor NPC1, and host ecology and distribution determine opportunity of viral entry. Additionally, receptor sequence data used for informing viral entry are available for only a small fraction of bat species. Here, we extend virus-specific susceptibility prediction across the global diversity of bats by integrating experimentally measured and physicochemically inferred virus-receptor binding strengths with phylogenetic, ecological, and environmental data. Using boosted regression models trained on binding assay labels, we generate predictions of NPC1-mediated binding strength for more than 1,300 bat species. Predicted susceptibility is strongly structured by evolutionary relationships, with high binding concentrated in particular bat lineages, but is further differentiated within clades by morphology, life-history strategy, and environmental context. Strikingly, macroevolutionary structure alone recovers interaction patterns originally derived from amino acid-level physicochemistry, indicating that information about receptor-mediated compatibility is recoverable from host evolutionary history and ecological traits. Predicted high binding strength extends well beyond historically recognized outbreak regions, suggesting that the fundamental host range of filoviruses may be substantially broader than their currently realized distribution. By scaling receptor biology to global host diversity, this multi-scale framework expands mechanistic susceptibility forecasting beyond species with available molecular data and provides a generalizable approach for integrating molecular and ecological information in zoonotic prediction.

5
MrtR of Mesorhizobium tianshanense reveals both activation and inhibition mechanisms of a LuxR-type quorum sensing receptor

Stoutland, I. M.; Blackwell, H. E.

2026-05-29 biochemistry 10.64898/2026.05.28.728602 medRxiv
Top 0.1%
32.8%
Show abstract

Quorum sensing (QS) enables common gram-negative bacteria to coordinate collective behaviors through small molecule signals, yet how these signals tune receptor activity remains incompletely understood. Here, we define a mechanism by which ligand structure controls function in a LuxR-type QS receptor. Using structural and biochemical analyses, we investigate MrtR from Mesorhizobium tianshanense and show that ligand acyl-chain length governs receptor assembly and activity. We present full-length structures of MrtR bound to activating and inhibitory ligands, revealing a switch in oligomeric state. Long-chain (C14) N-acyl L-homoserine lactones (AHLs) act as agonists by promoting intra- and inter-subunit interactions that lead to homodimerization and DNA binding. In contrast, shorter (C8) AHLs fail to promote these contacts, favoring a monomeric, inactive state. Ligands of intermediate length produce graded responses consistent with partial dimer stabilization. Biochemical measurements of DNA binding, thermostability, and oligomerization, together with targeted mutagenesis, support this model and establish the functional importance of key structural contacts. These findings provide the first structural comparison of a full-length LuxR-type receptor bound to both agonist and antagonist. Our findings expand the known structural and mechanistic diversity of the LuxR family and suggest mechanistic similarities between structurally distinct receptors. SIGNIFICANCEQuorum sensing (QS) regulates diverse bacterial behaviors, and LuxR-type receptors are attractive targets for applications ranging from antivirulence strategies to synthetic biology and agriculture. Despite intense interest in developing chemical modulators of these systems, the molecular basis by which small molecules agonize or antagonize LuxR-type receptors remains poorly understood. Here, we investigate the LuxR-type receptor MrtR and report crystal structures of the full-length receptor bound to an agonist and an antagonist, revealing how structurally similar compounds produce opposing outcomes. Notably, MrtR exhibits an unprecedented dimerization interface mediated by a ligand-responsive loop that undergoes large conformational changes. These findings establish a new structural framework for understanding signal discrimination in LuxR-type receptors and may enable rational reprogramming of QS in natural and engineered systems.

6
A domesticated totivirus-like tandem array undergoes interspecific transfer and asymmetric evolution

Taylor, D.; Tringali, D. A.

2026-05-25 evolutionary biology 10.64898/2026.05.24.726934 medRxiv
Top 0.1%
32.3%
Show abstract

RNA paleoviruses are expected to evolve more slowly than their exogenous viral progenitors. We show that a four-gene tandem array (STORM, Scheffersomyces Totivirus-like Responsive Module; genes TLC1-TLC4) in wood-associated yeasts violates this expectation, evolving faster at the protein level than its exogenous totiviral relatives while persisting for over 15 million years. STORM has accumulated greater amino-acid divergence than its exogenous totiviral relatives over a much shorter host phylogenetic window ([~]54 MY of Scheffersomyces history versus [~]225 MY for exogenous totivirus diversification), under significant relaxation of selective constraint (RELAX K < 1). Tandem duplication resulted in asymmetric evolution within the array. For example, TLC4 alone has retained the predicted decapping loop motif (lost from TLC1, TLC2, and TLC3) and a totivirus-like capsid fold. Other copies remain more constrained in structure and sequence, indicating functional partitioning. All four genes are transcriptionally active, embedded in host antiviral and RNA-decay regulatory neighborhoods, with condition-dependent expression. Hundreds of reference gene trees for Scheffersomyces are concordant with the species tree, with only two unrelated singleton exceptions; the STORM array is the only locus where all paralogs share a well-supported, locus-coherent discordance. Distance-based tests are inconsistent with incomplete lineage sorting, and shared discordance with an adjacent ATP10 pseudogene and a transposase (Tc1/mariner superfamily) implicates transposon-mediated co-mobilization. We infer at least two interspecific transfers of STORM. Our results reveal how hosts can domesticate a mobile virus-like module whose paralogs escape strong purifying selection and explore sequence space while the core fold is conserved.

7
Rapid centromere turnover and the adaptive radiation of lemurs

Trivedi, M.; Gianfrate, F.; de Gennaro, L.; Ayllon, M.; Munson, K. M.; Hoekzema, K.; Yoo, D.; Ehmke, E.; Yoder, A. D.; Chang, S.; Lalgudi, C.; Krasnow, M. A.; Ventura, M.; Eichler, E. E.

2026-05-19 genomics 10.64898/2026.05.16.725662 medRxiv
Top 0.1%
32.1%
Show abstract

Centromeres represent essential chromosomal structures required for faithful chromosome segregation during cell division but are paradoxically hypermutable, leading to centromere drive and reproductive isolation in closely related species. Using long-read sequencing, we generate nearly complete genomes (2.1-2.5 Gbp) from eight lemur species and characterize the sequence, epigenetic and cytogenetic structure of 223 strepsirrhini centromeres providing an alternative primate perspective of centromere evolution. No lemur centromere consists of -satellite DNA that typifies the haplorhine lineage; instead, each species evolved its own distinct higher-order centromeric repeat sequence, varying substantially in both monomer length (ranging from 41-548 bp) and primary sequence composition (GC percentages 28.7-67.9%) including centromere cooption of telomeric repeats in brown lemurs. Most centromeres show characteristic hypomethylation dip regions (110-300 kbp) as candidates for kinetochore attachment. The centromere sequence motif shows no apparent sequence homology among lemur genera, even for species separated by less than 15 million years (Lemur and Eulemur). We estimate a >6-fold increased rate in primary centromeric motif turnover in strepsirrhines when compared to haplorhines and this occurred in conjunction with positive selection of the CENP-B protein in lemur lineages. We propose that lemur radiation and centromere diversification are linked, whereby accelerated motif turnover provides a stasipatric barrier contributing to rapid chromosomal evolution.

8
An ancient retrotransposon provides species-specific tuning of IL-18 inflammatory signaling

Ordonez, A. D.; Allen, H.; Sanford, L.; Ivancevic, A.; Agyepong, A.; Chaw, M.; Bridges, J. P.; Schountz, T.; Chuong, E. B.

2026-05-26 evolutionary biology 10.64898/2026.05.25.727421 medRxiv
Top 0.1%
28.4%
Show abstract

Species differ markedly in how they regulate immune responses, yet the molecular basis of this variation remains incompletely understood. Here, we characterize an underexplored regulatory mechanism of the pro-inflammatory IL-18 pathway, centered on a truncated IL18R1 isoform with striking species-specific expression differences. This isoform, IL18R1-Short, derives from an ancient LINE2 retrotransposon insertion that provides an intronic polyadenylation signal, producing a receptor that lacks the intracellular signaling domain. Using RNA sequencing across nine mammals, we find that although this cis-regulatory element is broadly conserved, robust IL18R1-Short expression is species- and tissue-restricted: bats and mice express it at high levels, particularly in barrier tissues (lung, skin, intestine), whereas most other species, including humans, show little or no expression. Functionally, IL18R1-Short dampens IL-18-induced NF{kappa}B signaling in human, mouse, and bat systems, and knockdown of the mouse ortholog enhances IL-18-driven immune and inflammatory gene expression in mouse T cells. Together, these results identify IL18R1-Short as a transposable element-derived decoy receptor and highlight alternative transcription as a source of species-specific immune regulation.

9
Fast pairwise coalescence enables gene-resolution scans for recent selection in diverse human populations

Korfmann, K.; Mathieson, S.

2026-05-22 evolutionary biology 10.64898/2026.05.21.726777 medRxiv
Top 0.1%
28.2%
Show abstract

Identifying the genetic changes that shaped recent human adaptation depends on our ability to detect selection from genomic data. Summary statistics from haplotype scans have been widely used for that purpose, aggregating genetic signal over windows, though resolution is limited by linkage and their power may diminish as sweeps approach fixation, as in the case of the integrated haplotype score (iHS). Ancient DNA based scans recover signal by analysing time-series trajectories, but the majority of human populations fall outside the geographic range of any existing ancient DNA dataset. Pairwise coalescence times provide a way to complement statistics and can be applied to any modern cohort, yet computing them densely enough at cohort scale poses a computational challenge due to the quadratic growth in the number of haplotype pairs. We introduce gamma_smc_cu, a GPU implementation of the Gamma-SMC algorithm (Schweiger and Durbin, 2023) for pairwise time-to-the-most-recent-common-ancestor (TMRCA) inference. Applied to the 1000 Genomes Project (3,202 phased samples, corresponding to 6,404 haplotypes; 829,638 within-population pairs across 26 populations and five different continental ancestries; [~]1012 per-site posterior evaluations), it yields a gene-level TMRCA landscape of 17,823 autosomal protein-coding genes after masking for segmental duplications. The scan recovers well-known sweeps (LCT, SLC24A5, EDAR, FADS1, HERC2, ABCC11) and, combined with a depleted-to-enriched variant-class profile, resolves haplotype-block signals down to the gene level. Of seven case studies, two are developed in the main text -- GRK2 /ADRBK1 (chr11q13.2; SAS+EUR) and TREML1 /TREM2 (chr6p21.1) -- and the remaining five (IFIH1 chr2q24/IBS, CCDC92 chr12q24/CDX, SLC6A15 chr12q21/CHS, BPIFA2 chr20q11/GIH, CLEC6A chr12p13/CDX) are presented in the Supplementary Information (SI). Notably, TREML1 /TREM2 is a shared out-of-Africa signal -- ranked below the within-population 1% tail in 16 of 19 non-African 1000 Genomes panels that PopHumanScan and five landmark haplotype-based scans miss. A previous 10 kb-windowed-mean iHS scan dilutes the cluster of extreme sites packed inside the [~]5 kb gene bodies, while our own gene-level iHS independently recovers the locus in three South Asian panels (BEB, STU, ITU; top 0.4% genome-wide). We cross-validate the seven cases against the 9.7 million per-variant selection posteriors from a recent West-Eurasian ancient DNA scan. BPIFA2 is detected concordantly (s {approx} 1.8% per generation). GRK2 and CCDC92 reach detection threshold in flanking variants but not within their own gene bodies, while the TREML1 /TREM2 cluster falls below it. To calibrate novelty, we review the candidate landscape against an expanded eight-catalog set spanning curated haplotype scans, the largest current West-Eurasian ancient-DNA leads, and a recent 26-population iHS refinement; the vast majority of our loci overlap at least one prior entry, and only a handful -- including TREML1 /TREM2 -- remain unflagged. The contributions of this work are gene-level resolution, systematic ancient DNA cross-validation, and a reusable TMRCA landscape that complements aDNA panels.

10
First Lithic Age Caribbean genomes document pre-Ceramic genetic continuity and affinities to Central America and northern South America

Sirak, K.; Lopez Belando, A. J.; Shelley, D.; Arevalo, M. G.; Shelley, D.; Mallick, S.; Rohland, N.; Reich, D. E.

2026-05-13 genetics 10.64898/2026.05.12.724636 medRxiv
Top 0.1%
27.3%
Show abstract

The population history of the Caribbeans first inhabitants has been challenging to reconstruct because few human remains are known from the regions earliest occupation which began around 6,000 years ago in Hispaniola, Cuba, and Puerto Rico. We generated genome-wide data from 19 individuals from Hispaniolas Samana Peninsula and focused on four who lived during the earliest pre-Ceramic "Lithic Age". Extending the Caribbean genetic record by more than a millennium to [~]4,400 calBP, we show that pre-Ceramic Age populations across Hispaniola and Cuba derive from a single ancestry source and document long-term genetic continuity across islands, with some local genetic structure within Hispaniola. Pre-Ceramic Age Caribbean ancestry shares most drift with populations from Central America and northern South America, although no sampled mainland group provides an adequate proxy. We infer very small effective community sizes, consistent with locally structured mating pools and little evidence of close-kin mating. These findings extend our understanding of Caribbean population history into its earliest phase. TeaserEarly Caribbean people shared ancestry across islands and lived in small, locally structured communities.

11
A mosaic of genomic architectures underpins parasitism loss in a jawless vertebrate

Jacobs, A.; Decanter, N.; Torresen, O. K.; Garmann-Aarhus, B.; Capstick, M.; Rougemont, Q.; Guillaume, F.; Normand, R.; Tremblay, J.; Destouches, J.-P.; Besnard, A.-L.; Souissi, A.; Lassalle, G.; Stoeckel, S.; Petit, E.; Hoff, S. N. K.; Park, D.; Pope, B.; Jentoft, S.; Vollestad, L. A.; Jakobsen, K. S.; Evanno, G.

2026-05-13 evolutionary biology 10.64898/2026.05.11.724254 medRxiv
Top 0.1%
27.2%
Show abstract

Lampreys are the only ancestrally parasitic vertebrate lineage, yet parasitism has been repeatedly lost alongside a suite of life-history changes, such as loss of migration and juvenile feeding and accelerated maturation. Combining whole-genome resequencing, haplotype-resolved assemblies, hybrid-zone genotyping, multi-tissue transcriptomics, and sperm phenotyping, we map this life-history syndrome in European Lampetra to six chromosomes spanning a mosaic of genomic architectures: a [~]20 Mb low-recombination region on chromosome 1 lacking chromosomal rearrangements within Lampetra but involving inter-specific rearrangements across deep lamprey lineages; a translocated inversion with ecotype-dependent sperm-velocity effects; and ecotype-divergent deletions overlapping genes crucial for nervous system (CNTNAP2) and reproductive development (FSHR). However, this genomic basis is not shared with a convergent sister lineage, pointing to independent routes to a recurring life-history transition in lampreys.

12
Pangenome reference assemblies reveal the variation and recent activity of human LINE-1 retrotransposons

Yang, L.; Nematbakhsh, S.; Norseen, A.; McLaughlin, R. N.

2026-05-16 genomics 10.64898/2026.05.14.725010 medRxiv
Top 0.1%
25.9%
Show abstract

LINE-1 retrotransposons are the only autonomous mobile elements still active in human genomes and remain a potent source of mutation, genome remodeling, and disease risk. However, young, full-length, potentially active copies (the elements most likely to shape present-day genomes) have been largely inaccessible to population-scale analysis because they are long, repetitive, and poorly resolved by short-read sequencing. Here, we use 47 phased long-read assemblies from the Human Pangenome Reference Consortium, representing 94 haplotypes, to build an allele-resolved view of recent human LINE-1 evolution. We identify 13,617 LINE-1 alleles with intact ORF1 and ORF2 across 683 unique insertion sites, revealing that every genome carries a distinct repertoire of potentially active source elements. These intact LINE-1 profiles recapitulate broad human population structure while exposing a large, rare, and population-enriched reservoir of mobile-element diversity missed by single-reference approaches. We also resolve a structurally variable chromosome 11 LINE-1 array, demonstrating that local duplication and rearrangement can amplify LINE-1 sequence independently of canonical retrotransposition. By comparing full-length LINE-1 sequences, we define activity signatures that separate ancient remnants from recently expanding lineages and uncover young LINE-1 groups whose activity is not fully explained by canonical subfamily labels. Sequence-network analyses further reveal a dynamic history of lineage turnover, in which successful source elements rise, seed new insertions, and are replaced by descendants marked by specific nucleotide changes. Together, these data transform human LINE-1s from a repetitive background into a resolved evolutionary system, linking insertion polymorphism, coding potential, population history, and recent retrotransposon adaptation. Our findings establish the human pangenome as a framework for discovering active source elements and for testing how mobile DNA continues to shape genome evolution, host defense, and disease risk.

13
Structure tokens sharpen the feature vocabulary of protein language models

Steenwyk, J. L.

2026-05-14 bioinformatics 10.64898/2026.05.12.724593 medRxiv
Top 0.1%
25.9%
Show abstract

Protein language models predict structure and function from amino acid sequences, but the internal computations that produce these predictions remain opaque. We applied sparse autoencoders to ESM-2 (650M parameters, sequence-only) and ESM-3 (1.4B parameters, multimodal) and found that 78% of learned features converge between the two architectures (permutation null: 14.2%, p < 0.001). These convergent features account for nearly all functional knowledge encoded by the models (functional site AUROC 0.925 versus 0.661 for architecture-unique features). Structure tokens in ESM-3 do not create a new feature vocabulary. Instead, the 15.2% of features most activated by structure tokens are more convergent with sequence-only ESM-2 than structure-invariant features are (r = 0.54 versus 0.45) and carry richer biological annotation (134 versus 29 enriched GO terms). Attention analysis identified a single geometric head (L0H7) as the bottleneck through which structural information enters the network; ablating this head alone changed secondary structure predictions at 40% of residues, while ablating random layer-0 heads altered fewer than 17%. Steering vectors, attribution patching, and sparse feature circuits confirmed that these features sit within the models causal pathway. Two architecturally distinct models, trained on different objectives and input modalities, converge on a shared biological vocabulary -- and explicit structure tokens sharpen that vocabulary rather than rewriting it.

14
A comparative brain atlas of Mexican cavefish identifies naturally-occurring changes in cellular composition and gene expression

Gallman, K.; Ricemeyer, E.; Rastogi, A.; X, M.; Mendez Scolari, E.; Duboue, E. R.; Rohner, N.; Iyer, H. S.; Warren, W. C.; Keene, A. C.

2026-05-14 evolutionary biology 10.64898/2026.05.12.723532 medRxiv
Top 0.1%
25.9%
Show abstract

Understanding how naturally occurring genetic variation shapes human health and disease is critical for improving diagnosis and treatment strategies. The Mexican cavefish, Astyanax mexicanus, represents a powerful system for evolutionary medicine, enabling investigation of naturally evolved mechanisms of resilience to disease-related traits including diabetes, obesity, insomnia, and eye loss. Larval A. mexicanus, like zebrafish, are transparent, allowing whole-brain imaging, circuit mapping, and the generation of computationally derived atlases that precisely quantify neuroanatomical differences between surface and cave populations. Developing a molecular map of brain cell types provides a foundation for identifying evolved differences in neural circuits and physiology. Here, we present a single-cell atlas of the larval cavefish brain that reveals widespread divergence in the abundance and molecular signatures of neurons and glia. Our cell type map validates known neuroanatomical differences, including a reduction of the optic tectum and expansion of the pineal gland in cavefish. We uncover substantial changes in multiple glial cell classes that are linked to neural regulation of behavior, including microglia. Analysis of differential gene expression between surface and cavefish microglia revealed enhanced genes associated with synaptic pruning and clearance of neural debris, suggesting cavefish increased microglia activity to shape brain development. We also analyzed cell types that did not classify as canonical neurons or glia and identified notable divergence in transcriptomes and cell composition, including reduced meningeal fibroblasts in cavefish and substantial transcriptional changes related to phototransduction in non-visual photoreceptors within the pineal gland. Together, these findings provide a comprehensive atlas of cell type-specific gene expression differences between A. mexicanus surface and cavefish, establishing a platform for dissecting the molecular and cellular basis of evolved disease resilience in cavefish

15
Structure and inhibition of the sperm TMEM95-FIMP complex in mammalian fertilization

Liu, P.; Castelino, R. E.; Gierke, T. R.; Wood, A. J.; Lu, Y.; Tang, S.

2026-05-15 biochemistry 10.64898/2026.05.14.724122 medRxiv
Top 0.1%
25.8%
Show abstract

TMEM95 is a sperm acrosomal membrane protein essential for mammalian fertilization. How TMEM95 facilitates sperm-egg interaction has largely remained unknown. Analogous sperm fertilization proteins function as complexes, leading us to hypothesize that TMEM95 may have a binding partner on sperm. Here, we surveyed interactions between TMEM95 and individual proteins in a curated library of testis-expressed proteins using AlphaFold3. We identify FIMP, a fertilization-essential acrosomal membrane protein, as a high-confidence interaction partner of TMEM95. These two proteins form a high-affinity complex through their ectodomains. Using single-particle cryo-EM, we determine the structure of the human TMEM95-FIMP ectodomain complex at high resolution. An aromatic motif of FIMP binds to a conserved surface of TMEM95, and amino acid substitutions within this motif ablate the TMEM95-binding activity of FIMP. We isolate an anti-TMEM95 antibody, termed 3A02, that binds to human and murine TMEM95 and disrupts the interaction between TMEM95 and FIMP. By determining the cryo-EM structure of human TMEM95 bound to the 3A02 fragment antigen-binding region, we find that 3A02 recognizes the FIMP-binding site on TMEM95. 3A02 inhibits fusion of murine sperm with eggs, independent of antibody size, suggesting that the TMEM95-FIMP interface is critical for sperm-egg interaction. Together, these results establish the human sperm TMEM95-FIMP complex and suggest that a FIMP-mediated interaction of TMEM95 facilitates membrane fusion during mammalian fertilization.

16
Tryptophan became part of the universal genetic code post-LUCA

Wehbi, S.; Ly-Trong, N.; Wheeler, A.; Minh, B. Q.; Lauretta, D.; Masel, J.

2026-05-31 evolutionary biology 10.64898/2026.05.28.728509 medRxiv
Top 0.1%
25.8%
Show abstract

We evaluate whether tryptophan (W), widely thought to be the last of the 20 canonical amino acids added to the genetic code, was already present in the Last Universal Common Ancestor (LUCA). We reconstruct the evolutionary history of tryptophanyl-tRNA synthetase (WRS), the enzyme that attaches W to its tRNA, and the related tyrosyl-tRNA synthetase (YRS). We identify and exclude sequences derived from ancient recombination between archaeal and bacterial YRSs. Diverse rooting methods, including a novel approach exploiting time non-reversible evolution, all place the root between bacterial and archaeal YRS rather than between YRS and WRS. This supports post-LUCA WRS origination in Archaea, followed by its horizontal transfer to Bacteria. However, ancestral sequence reconstruction suggests that Archaea were depleted for W while Bacteria were not, and enzymes essential for W biosynthesis emerged in Bacteria. This suggests that W usage originated in Bacteria, with later WRS emergence in Archaea allowing the archaeal genetic code to converge with the bacterial code. The universality of the genetic code is usually attributed to common descent from LUCA, but the final step making the code universal was instead achieved by horizontal gene transfer. This gives credence to similar mechanisms for earlier steps in genetic code evolution.

17
Repeated Marine-To-Freshwater Fish Transitions Reveal Paleoenvironmental Modulation Of Adaptive Radiation

Medeiros, A. P. M.; Rincon-Sandoval, M.; Davis, A.; Santaquiteria, A.; Thacker, C. E.; Egan, J. P.; Kim, J.; Arcila, D.; Ludt, W. B.; Hughes, L. C.; Bloom, D.; Betancur-R., R.

2026-05-29 evolutionary biology 10.64898/2026.05.26.728014 medRxiv
Top 0.1%
25.6%
Show abstract

Tropical rivers in Australia and New Guinea (Sahul) provide a rare natural experiment in vertebrate evolution: unlike other continental systems, their freshwater ichthyofaunas are composed almost entirely of marine-derived lineages rather than primary freshwater fishes. This unique biogeographic setting enables replicated tests of why some marine-to-freshwater transitions give rise to extensive adaptive radiations whereas others remain species-poor, and whether these outcomes reflect ecological opportunity or temporally structured paleoenvironmental constraints. Using a densely sampled, time-calibrated phylogenomic framework spanning 2,303 teleost species, we identified a likely range of 27-34 marine-to-freshwater transitions during the Cenozoic, including a pronounced Middle Miocene peak (16-11 Ma). Although ecological opportunity in Sahul rivers enabled repeated colonization in the absence of dominant primary freshwater incumbents, younger freshwater lineages nevertheless diversify faster than older ones, contradicting the expectation that early arrivers should undergo elevated diversification when accessing vacant niche space. Although some colonizations coincide with bursts of speciation consistent with adaptive radiation, many yielded few species despite long residence times. Functional trait analyses likewise revealed no consistent relationship between colonization timing, ecological breadth, or diversification rate, although expanded functional space characterizes previously proposed Sahul adaptive radiations. Comparisons with paleoenvironmental curves indicate that colonization success correlates with sea-level minima and low-oxygen conditions, suggesting that Earth history dynamics modulated when ecological opportunity was accessible. Our results show that although ecological opportunity enabled repeated freshwater invasions into the Sahul region, diversification outcomes are governed by the interaction of paleoenvironmental dynamics and possibly lineage-specific traits, generating stark asymmetries in freshwater radiations. Significance statementTropical rivers in Australia and New Guinea host one of the most unusual continental freshwater fish assemblages on Earth, composed almost entirely of marine-derived lineages. This system allows asking why some colonizing lineages diversify dramatically while others remain species-poor on a continental scale. Using large-scale phylogenomic and functional trait data, we show that early arrival alone does not predict diversification success. Instead, the lineages that radiate most successfully are those whose arrival coincides with windows of paleoenvironmental opportunity created by sea-level and oxygen fluctuations. These results reveal that the fates of colonizing lineages are shaped not only by ecological opportunity, but also by Earth-history dynamics that govern when, where, and how species can invade and diversify.

18
The free-living wellspring of symbiotic nitrogen fixation in Bradyrhizobium

LING, L.; Wang, S.; Tao, J.; Pervent, M.; Ho, K. E.; Sciallano, C.; Camuel, A.; Nouwen, N.; Giraud, E.; Luo, H.

2026-05-29 microbiology 10.64898/2026.05.28.728359 medRxiv
Top 0.1%
25.4%
Show abstract

The evolutionary origin of nitrogen-fixing symbiosis has been a long-standing question. To address this, we focused on Bradyrhizobium, a globally abundant bacterial genus that includes classic symbiotic lineages, which rely on the common Nod factor signaling pathway to form nodules, and close relatives capable of fixing nitrogen in a free-living state. We isolated 88 strains carrying the key genes for nitrogen fixation (nif) from non-legume environments and analyzed them alongside 586 public Bradyrhizobium genomes harboring these genes to reconstruct a robust phylogeny of nif genes. Analysis reveals that the earliest-diverging nif lineages are members capable of free-living nitrogen fixation, establishing this as the ancestral state. The Nod factor-dependent symbiotic lineages are polyphyletic, demonstrating at least three independent origins via horizontal acquisition of symbiosis islands. This evolutionary history is reflected in a genomic dichotomy: lineages capable of free-living nitrogen fixation possess a conserved nif island architecture that consistently includes the oxygen-protective gene glbO, whereas the symbiotic nif-associated regions are highly variable and universally lack glbO. Using both loss-of-function and gain-of-function genetic approaches, we show that glbO contributes significantly to nitrogenase activity under free-living conditions, whereas it is dispensable within the protected nodule environment. This work establishes a new framework for the evolution of symbiosis, identifying free-living ancestors as the source from which nitrogen-fixing symbiosis repeatedly and independently evolved in Bradyrhizobium. SignificanceFor over a century, nitrogen-fixing bacteria called rhizobia have been celebrated for their symbiotic partnership with legumes like soybeans and peas. This study overturns the view that this symbiotic lifestyle was the original, advanced state. By discovering and analyzing diverse Bradyrhizobium bacteria from ordinary soils and non-legume plants, we show that the ancestor was capable of free-living nitrogen fixation. Root nodule nitrogen fixation evolved multiple independent times from free-living ancestors. We identify glbO, an oxygen-protective gene located immediately next to the nitrogen fixing genes, as critical for nitrogen fixation in variable soil conditions but dispensable and lost in Bradyrhizobium specialized inside nodules. This repositions free-living Bradyrhizobium as a major potential nitrogen source beyond legumes, with promise for sustainable agriculture.

19
Epistatic evolution drives HLA-dependent CD8+ T Cell escape risk in diverse populations

Hamelin, D. J.; Grenier, J.-C.; Poujol, R.; Bourdin, B.; Pare, B.; Simpson, S.; Smith, M.; Decaluwe, H.; Caron, E.; Hussin, J.

2026-05-26 genomics 10.64898/2026.05.22.727291 medRxiv
Top 0.1%
25.4%
Show abstract

Understanding how viral evolution shapes HLA-dependent T cell escape is crucial to identify individuals at risk of reduced cellular immunity to emerging variants. Nevertheless, we lack frameworks to model HLA diversity and the evolutionary feasibility of T cell-evading mutations. Here, we construct an HLA map capturing variation in epitope specificity across HLA-typed cohorts. Enhancing this framework with SARS-CoV-2 CD8 T cell escape reveals heterogeneous escape across HLA-defined groups, with clusters enriched for HLA-B*07:02, HLA-A*03:01 and HLA-A*02:01 showing higher epitope loss. To assess the evolutionary plausibility of escape, we model viral sequence fitness using an epistasis-aware protein language approach trained on coronaviruses to systematically score mutations across viral lineages. We find that the fitness effect of mutations dynamically changes with evolving sequence context, and that T cell-evading mutations become fitter with additional escape mutations. This study links host HLA diversity to viral fitness landscapes for surveillance and vaccine design.

20
Beyond copy number: The regulatory architecture of mitochondrial DNA gene expression

Riahi, P.; Le, B.; M R, S.; Taylor-Brill, S.; Taylor, D.; McCoy, R. C.; Ramdas, S.; Zaidi, A. A.

2026-05-30 genomics 10.64898/2026.05.27.728277 medRxiv
Top 0.2%
25.2%
Show abstract

Mitochondrial DNA (mtDNA) copy number is widely used as a biomarker for mitochondrial function and disease risk, yet its relationship to mtDNA gene expression - one important functional output - remains poorly understood. Prior studies examining this relationship have largely relied on heterogeneous tissue samples, where confounding by cell-type composition obscures the underlying biology. We rigorously tested this relationship in lymphoblastoid cell lines (LCLs), where we find no correlation between mtDNA copy number and gene expression across 731 individuals, and minimal association across 49 GTEx tissues except whole blood. Using population genetic modeling of heteroplasmy drift between DNA and RNA, we estimate that effectively {approx}50 out of 813 mtDNA templates are transcriptionally active in LCLs, indicating low mtDNA accessibility. This confirms, using an independent method and a different cell type, previous observations in HeLa cells, where mtDNA is largely compacted into nucleoids. Together, our results demonstrate that mtDNA copy number and expression are largely decoupled, and that above a certain rate-limiting threshold, mtDNA accessibility -- rather than absolute copy number -- is the more relevant quantity for explaining inter-individual differences in gene expression. This challenges the interpretation of mtDNA copy number as a proxy for mitochondrial transcriptional output and highlights the need for a more mechanistic understanding of mtDNA copy number associations with disease-relevant traits. Cis- and trans-eQTL mapping further reveals that genetic regulation of mtDNA gene expression operates primarily through post-transcriptional mechanisms rather than transcription initiation, yet despite the high inter-individual variance in mtDNA gene expression, genetic variation underlying mtDNA regulation appears to be under strong selective constraint.