Back

Science

American Association for the Advancement of Science (AAAS)

Preprints posted in the last 30 days, ranked by how well they match Science's content profile, based on 429 papers previously published here. The average preprint has a 1.02% match score for this journal, so anything above that is already an above-average fit.

1
A deep learning predictor of bindable protein surfaces toguide generative synthetic biology

Almeida-Souza, L.

2026-04-16 synthetic biology 10.64898/2026.04.16.718848 medRxiv
Top 0.1%
41.1%
Show abstract

The advent of generative machine learning models has revolutionized de novo design of protein binders. However, the wide adoption of this revolution is bottlenecked by computational cost. For many targets, binder design commonly requires computationally intensive sampling across structures, often wasting days of GPU time on unwanted or geometrically inviable regions. Here, IARA (Interface Analysis and Recognition Architecture) is introduced, a deep learning Graph Neural Network designed as a rapid structural filter to triage protein binder generative pipelines. IARA is trained entirely on BindCraft trajectories generated against s RFdiffusion-generated targets. Based on a slim network with only seven residue features, IARA maps the binder designability of input proteins in a matter of seconds. On validation runs using BindCraft, RFdiffusion and BoltzGen, IARA successfully identified the optimal binding interface for practically all targets. By instantly pinpointing the highest-probability binding pockets, IARA democratizes synthetic biology, drastically reducing the exploratory GPU compute required for successful de novo binder generation.

2
A haplotype-resolved bluethroat (Luscinia s. svecica) genome assembly uncovers the complex MHC region

Strand, M. A.; Enevoldsen, E. L. G.; Toerresen, O. K.; Skage, M.; Ferrari, G.; Tooming-Klunderud, A.; Leder, E. H.; Lifjeld, J. T.; Johnsen, A.; Jakobsen, K. S.

2026-03-30 genomics 10.64898/2026.03.26.714473 medRxiv
Top 0.1%
32.2%
Show abstract

We describe a chromosome-level, haplotype-resolved genome assembly from a female bluethroat (Luscinia s. svecica). The assembly comprises two pseudo-haplotypes of 1461 Mb and 1171 Mb, with 77.4% and 88.4% scaffolded into 40 autosomal chromosomes and the W and Z sex chromosomes (haplotype one). Assembly completeness is high (BUSCO 99.2% and 94.9%), with 22,462 and 18,769 annotated protein-coding genes for haplotypes one and two, respectively. The use of Oxford Nanopore Technologies sequencing enables resolution of genomic regions that are often fragmented in genome assemblies, including the hypervariable Major Histocompatibility Complex (MHC). We find that MHC loci include both the canonical organization of tandemly duplicated MHCII{beta} genes with a single MHCIIA, and a distinct arrangement in which MHCI and MHCII{beta} loci are interspersed in intermixed arrays, and that substantial structural differences between haplotypes are directly resolved in the assembly.

3
The Neanderthal population history and the introgression landscape inferred from the UK Biobank

Morez Jacobs, A.; Soltantouyeh, A.; Zeloni, R.; Carollo, F.; Mezzavilla, M.; Marnetto, D.; Pagani, L.

2026-04-04 evolutionary biology 10.64898/2026.04.03.716297 medRxiv
Top 0.1%
32.2%
Show abstract

Neanderthal haplotypes in present-day Eurasians are unevenly distributed across the genome, forming introgression deserts and high-frequency segments consistent with adaptive introgression, with additional random variation affected by genetic drift. However, current estimates are limited by modest sample sizes and analyses restricted to subsets of the genome, given that any individual carries only 1-2% Neanderthal ancestry. Here we extract and analyse Neanderthal haplotypes from 45,000 imputed and phased genomes in the UK Biobank. Even at this scale, the number of sites overlapping Neanderthal haplotypes approaches--but does not reach--saturation, with rare haplotypes still being discovered. Using the derived allele frequency spectrum within the surviving Neanderthal segments, we infer a divergence time of 2,061 generations between the introgressed lineage and the Vindija Neanderthal, and estimate the effective population size of the introgressed lineage to Ne = 6,564. Individual-level resolution allows identification of 545 independent loci with excess Neanderthal homozygosity, consistent with ongoing selection. Despite the extensive dataset, a substantial portion of the genome remains a Neanderthal desert. Within these regions, we detect seven Human Accelerated Regions affected by recent human selective sweeps (TMRCA <650 kya), four located within introns of cerebellum-expressed genes, providing further support for their potential as modern human-specific adaptation.

4
Convergent natural selection at both ends of Eurasia during parallel radical lifestyle shifts in the last ten millennia

Barton, A. R.; Rohland, N.; Mallick, S.; Pinhasi, R.; Akbari, A.; Reich, D.

2026-04-04 evolutionary biology 10.64898/2026.04.03.716344 medRxiv
Top 0.1%
31.8%
Show abstract

Ancient DNA-based studies of natural selection have focused on West Eurasia due to the availability of large sample sizes, but rich insights are expected to come from comparative studies that can reveal which patterns are shared and which region-specific. We test around seven million variants for selection in 1,862 ancient East Eurasians (867 with new data) distributed over the last ten millennia. Using a generalized linear mixed model to control for population structure, we identify 40 genome-wide significant signals of selection, which have a particularly strong impact on immune and cardiometabolic traits just as in West Eurasia. East and West Eurasia show highly correlated signals of adaptation both for individual alleles and for complex traits, showing how these geographically separate groups experienced convergent evolution in response to parallel transitions to food producing economies and the accompanying lifestyle changes. An exception is the genetic determinants of light skin color: West Eurasians depigmented in the last 10,000 years, but most skin lightening in East Asians arose prior to the Holocene.

5
Punctuated Evolution of Endomembrane Compartments in Proto-Eukaryotes

Shridhar, S.; Kumari, K.; Thattai, M.

2026-04-14 evolutionary biology 10.64898/2026.04.13.718263 medRxiv
Top 0.1%
31.7%
Show abstract

Eukaryotic cells are defined by their endomembranes: compartments such as the endoplasmic reticulum (ER), Golgi and endosomes, exchanging cargo via vesicles. The evolutionary origins of endomembrane compartments remain unclear. Here we construct molecular-evolutionary trajectories for the stepwise addition of compartments after the emergence of the proto-ER in an ancestral eukaryote. We represent compartments and vesicles as nodes and edges of a directed graph. Vesicle budding and fusion regulators such as coats and SNAREs control cargo flows and determine compartment compositions. We computationally sample billions of possible graphs, and enumerate how duplication, deletion and mutation of regulators drive graph transitions. We find that evolutionary trajectories display punctuated shifts in compartment composition and number, interspersed with thousands of neutral mutations. The first added compartment inherits functions from the proto-ER or plasma membrane, or gains novel functions. Our results show how, given a billion years, simple molecular steps can generate complex endomembrane systems. SO_SCPLOWIGNIFICANCEC_SCPLOW SO_SCPLOWTATEMENTC_SCPLOWEukaryotic cells contain a system of endomembrane compartments that sort, process and deliver molecules to precise cellular destinations. This endomembrane system is a defining feature of all complex life, yet its evolutionary origins remain obscure. How did a proto-eukaryote with a single ancestral endomembrane compartment evolve into a cell with a Golgi, endosomes, lysosomes and other compartments characteristic of modern eukaryotes? We model this process from first principles, connecting the duplication, deletion and mutation of molecular regulators to compartment gain or loss. We find a punctuated pattern of endomembrane elaboration: a long phase of neutral exploration, driven by the mutation of duplicate gene copies, precedes the emergence of new compartments and functions.

6
Effects of introgressed Neanderthal alleles on present-day brain morphology

Zeloni, R.; Amaolo, A.; Morez Jacobs, A.; Zapparoli, E.; Akl, Y.; Shafie, M.; Huerta-Sanchez, E.; Pizzagalli, F.; Provero, P.; Pagani, L.; Marnetto, D.

2026-04-14 genomics 10.64898/2026.04.14.718380 medRxiv
Top 0.1%
31.7%
Show abstract

Neanderthal introgression contributed a small fraction of genetic variants to present-day non-African genomes. While differences in cranial globularity between Neanderthal and modern humans are well documented from endocasts, the phenotypic consequences of these introgressed alleles can illuminate otherwise inaccessible genetically divergent brain structures. We analyzed 370 MRI-derived brain traits--including cortical and subcortical regional measurements, cortical folding metrics, diffusion tracts--in nearly 40,000 UK Biobank participants. To quantify the impact of Neanderthal ancestry, we intersected trait-associated loci with Neanderthal-derived variants identified from introgressed segments imputed in the same subjects. Low-frequency introgressed variants were depleted for detectable effects on brain phenotypes, whereas common introgressed variants showed no comparable depletion. Conversely, Neanderthal deserts were consistently enriched for functional effects. Eight associations were fine-mapped to Neanderthal-derived variants: one locus near the gene DAAM1 was especially prominent across multiple traits, including opposite effects in the cuneus and precuneus mediated by introgressed regulatory variants. Genome-wide directional alignment of Neanderthal effects was limited but became evident when focusing on suggestive loci: frontal and parietal areas were the most consistently affected traits, though not in a direction that obviously mirrors known modern-archaic morphology divergence. Several of these loci also influenced neuropsychiatric traits, with detectable polygenic consequences against schizophrenia and towards major depression, linking neuroanatomical and neuropsychiatric impact of Neanderthal introgression. These findings suggest that while introgressed alleles affecting divergent neuroanatomy between modern humans and Neanderthals were largely purged, a subset of tolerated alleles continues to shape human brain morphology and mental health.

7
Genome-wide genealogies reveal deep admixtures forming modern humans

Loya, H.; Gupta Hinch, A.; Palamara, P. F.; Speidel, L.; Myers, S. R.

2026-04-17 evolutionary biology 10.64898/2026.04.17.719197 medRxiv
Top 0.1%
31.5%
Show abstract

Over the past decade, genomic modelling has revealed a rich tapestry of admixtures shaping present-day human populations. These have largely focused on the past few thousand years, when ancestral populations are either well characterised by present-day genomic diversity or directly observed through ancient DNA. Genomic modelling and fossil evidence have so far only provided a fragmented picture of the coexistence and mixing of human groups in the deeper past. Here, we propose a new method, GhostBuster, that leverages inferred genome-wide genealogies to detect admixture events of unsampled ghost populations, while simultaneously inferring accurate local ancestry. Local ancestry enables us to identify ancestry-specific genomic signatures that independently corroborate the events. We identify at least three waves of "back-to-Africa" migrations starting [~]14,000 years ago. Applying GhostBuster to deeper timescales reveals that modern humans were shaped by repeated episodes of mixture. Around 50,000 years ago, we identify a human lineage that expanded to form present-day non-Africans, while also expanding within Africa, mixing with the other local African group in varying proportions. These ancient groups help explain polygenic score portability differences within Africa, and exhibit differences in population size and recombination landscapes. Extending our analysis further back to between 300,000 and 1 million years ago reveals two deeply diverged ancestral lineages. These lineages evolved profoundly different recombination landscapes, with different PRDM9 alleles (PRDM9-A and C) and recombination hotspots. We demonstrate that both Neanderthals and ancestral modern humans are formed through a mixture of these two lineages, with no evidence of gene flow from the PRDM9-A-carrying group into Denisovans.

8
Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life

Copeland, M.; McConnell, M.; Barboza, A.; Abraham, H. M.; Alfieri, J.; Arackal, S.; Bernard, C. E.; Bryant, K.; Cast, S.; Chien, S.; Clark, E.; Cruz, C. E.; Diaz, A. Y.; Deiterman, O.; Girish, R.; Harper, K.; Hjelmen, C. E.; Thompson, M. J.; Koehl, R.; Koneru, T.; Laird, K.; Lee, Y.; Lopez, V. R.; Murphy, M.; Perez, N.; Schmalz, S.; Sylvester, T.; Blackmon, H.

2026-04-16 evolutionary biology 10.64898/2026.04.14.718287 medRxiv
Top 0.1%
31.3%
Show abstract

Chromosome number shapes genome organization, recombination, and speciation, yet how fast it evolves across the tree of life has never been measured. We analyzed 63,682 karyotypes across 55 eukaryotic clades and found that dysploidy rates vary by 844-fold, from approximately 0.0008 to 0.7 events per million years. This variation does not follow kingdom boundaries or deep phylogeny; intraclade variance exceeds interclade differences by more than an order of magnitude. Even birds, the textbook example of chromosomal stasis, exceed the global median rate once microchromosome dynamics are resolved. Contrasting the stasis of Odonata with the volatility of Orchidaceae reveals that life history and population structure, rather than deep phylogenetic constraints, govern the tempo of karyotypic change.

9
The Synthetic Epitope Atlas: High-Throughput Design and Validation of De Novo Antibody-Antigen Complexes

Altieri, N.; Harman, J. L.; Noble, D.; Murakowska, N.; Eng, A.; McGowan, K. L.; Goodnight, D.; DiPeso, L.; Shikany, C.; Engelhart, E.; Homad, L. J.; Lahman, M. C.; Gandhi, S.; Goodwin, M.; Herbst, K.; Lin, C.; McMurray, M.; Barrett, J.; Agarwal, A. A.; Harrang, J.; Emerson, R. O.; Lopez, R. M.; Younger, D. A.; Lange, A. W.

2026-04-18 synthetic biology 10.64898/2026.04.17.719295 medRxiv
Top 0.1%
28.3%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWDe novo antibody design models lack sufficient training data to reliably generalize. We demonstrate scalable generation of structural training data for machine learning-driven antibody design by linking in silico designs of antibody-antigen complexes to high-throughput experimental binding validation. Using AlphaSeq, a yeast-based platform for measuring protein binding affinities, we measure the affinity and specificity of thousands of de novo "synthetic epitope proteins" (SEPs) designed to bind to VHHs. The resulting Synthetic Epitope Atlas (SEPIA) pairs over 26 million on- and off-target affinity measurements with computationally designed VHH-SEP "pseudo-structures." We validate strong, specific binding for 1,161 pseudo-structures and >75,000 VHH and SEP mutational variants. We show that these pseudo-structures complement existing structural databases and enable ML models to outperform confidence metrics commonly used to rank de novo antibody designs. Taken together, SEPIA establishes a scalable framework for improving de novo antibody design by augmenting sparse structural data with large-scale experimental binding data.

10
Structural Insights into the Integration of Temperature and pH by Sperm Calcium Channel CatSper

Zhao, B.; Bhagwat, S.; Ferreira, J.; Swain, D. K.; Santi, C.; Fu, Z.; Lishko, P. V.

2026-04-14 physiology 10.64898/2026.04.10.717635 medRxiv
Top 0.1%
27.8%
Show abstract

The cation channel of sperm (CatSper) is a sperm-specific calcium channel essential for male fertility across metazoans. Its activation is tightly restricted to defined physiological contexts, including intracellular alkalinization, membrane depolarization, and elevated temperature. The structural and evolutionary mechanisms underlying this polymodal integration remain poorly understood, in part due to the architectural complexity of CatSper, a [~]15-subunit assembly organized in zigzag arrays along the sperm flagellum. Here we combine comparative genomics across 47 species with AlphaFold3-based modeling and evolutionary sequence-structure analyses to uncover a mechanism for temperature and pH integration. We identify the pore-forming subunit CatSper1 as an evolutionary hotspot exhibiting exceptional divergence in its N-terminal domain. Phylogenetic analysis shows that N-terminal length and histidine enrichment scale with species-specific fertilization temperatures, suggesting adaptive tuning of physicochemical sensitivity. Structural modeling indicates that conserved surface-exposed histidine clusters form inter-complex coupling interfaces between adjacent CatSper assemblies positioned near the dominant voltage-sensing module. Functional validation using electrophysiology and calcium imaging in mouse sperm shows that capacitation-associated partial removal of the CatSper1 N-terminus selectively impairs temperature-dependent activation. Together, these findings support a model in which temperature-dependent histidine deprotonation modulates supramolecular CatSper assembly to coordinate channel activation.

11
Convergent genome streamlining accompanies independent miniaturization in the world's smallest fishes

Sudasinghe, H.; Matschiner, M.; Britz, R.; Conway, K. W.; Tan, H. H.; Salzburger, W.; Peichel, C.; Rueber, L.

2026-04-21 evolutionary biology 10.64898/2026.04.20.719654 medRxiv
Top 0.1%
27.6%
Show abstract

Miniaturization, the reduction of adult body size to an extreme degree, has evolved repeatedly across vertebrates. Yet its genomic underpinnings remain poorly understood. Cypriniformes, the most species-rich order of freshwater fishes, contains multiple miniaturized lineages that have evolved contrasting developmental processes. Proportioned dwarfs are tiny-bodied but otherwise morphologically similar to larger relatives, while progenetic miniatures exhibit developmental truncation thus retaining larval-like anatomical features into adulthood. Using a new time-calibrated phylogeny of 309 cypriniform species and comparative genomic analyses of 33 high-quality genome assemblies, we investigated the evolutionary history and genomic correlates of miniaturization across this order. Ancestral state reconstruction revealed multiple independent origins of both miniature types, with transitions predominantly unidirectional and non-randomly distributed across the phylogeny. The origins of the two types of miniatures differed in their timing. Progenetic miniatures arose predominantly as early as the Eocene while proportioned dwarfs arose mainly within the Miocene period. Genome size variation across Cypriniformes has been overwhelmingly driven by polyploidy. However, progenetic miniatures but not proportioned dwarfs showed consistent genome size reduction. Comparative genomic analyses revealed that all three independently-evolved progenetic miniature lineages share convergent signatures of repeat loss alongside genome-wide intron shortening, patterns absent in proportioned dwarfs. Our study provides the broadest evidence to date that progenetic miniaturization, despite independent origins, is underpinned by predictable structural genomic changes, revealing a fundamental link between developmental truncation and genome architecture in vertebrates.

12
Global genomic diversity of the selfing nematode Caenorhabditis tropicalis correlates with geography

Wang, B.; Moya, N. D.; Tanny, R. E.; Sauria, M. E. G.; O Connor, L. M.; Khorshidian, A.; McKeown, R.; Stevens, L.; Buchanan, C.; Crombie, T. A.; Dilks, C. M.; Evans, K. S.; Cook, D. E.; Zhang, G.; Stinson, L. A.; Roberto, N. M.; Lee, D.; Zdraljevic, S.; Gosse, C.; Gimond, C.; Chen, M.-E.; Dang, V. D.; Wang, J.; Cutter, A. D.; Rockman, M. V.; Felix, M.-A.; Braendle, C.; Andersen, E. C.

2026-04-08 genomics 10.64898/2026.04.05.716573 medRxiv
Top 0.1%
27.6%
Show abstract

Self-fertilization reduces genetic diversity compared to outcrossing and hypothetically decreases the ability to adapt to diverse environments. Among Caenorhabditis nematodes, self-fertilization evolved three times independently in Caenorhabditis elegans, Caenorhabditis briggsae, and the more recently discovered Caenorhabditis tropicalis. To survey C. tropicalis genetic relatedness, the influence of geography and niche on species-wide variation, and the signatures of selection, we collected 785 wild strains, sequenced their genomes, and identified 622 distinct genotypes (isotypes). In contrast to C. elegans and C. briggsae, C. tropicalis relatedness shows substantial association with geography and no transcontinental selective sweeps or broadly sampled isotypes. Populations from the Hawaiian Islands or Taiwan harbor more genetic variation than populations from the Caribbean or Americas, suggesting a Pacific species origin similar to other members of the Elegans subclade. Punctuated genomic regions of extreme genetic variation pervade the genome. These hyper-divergent regions (HDRs) comprise less than 6% of the reference genome in any given strain despite harboring 73% of all variant sites and are enriched for genes likely involved in environmental adaptation. HDRs represent a shared genomic feature of self-fertilizing Caenorhabditis nematodes despite their independent evolutionary origins and suggest a mechanism to explain worldwide distributions despite low species-wide levels of genetic variation.

13
Widespread Genomic Islands in Giant Viruses Shape Genome Plasticity and Mosaicism

Minch, B.; Moniruzzaman, M.

2026-04-15 ecology 10.64898/2026.04.13.718226 medRxiv
Top 0.1%
27.6%
Show abstract

Giant viruses in the phylum Nucleocytoviricota possess exceptionally large and mosaic genomes, yet the mechanisms underlying their remarkable genomic plasticity remain poorly understood. Genomic islands are large dynamic genomic regions that are major drivers of genome diversification and adaptation in bacteria. However, their contribution to genome evolution in giant viruses remains largely unexplored. Here, we systematically characterize the genomic island landscape of giant viruses using 369 high-quality genomes spanning cultured isolates and long-read metagenome-assembled genomes. We identify 307 genomic islands across >50% of the genomes, demonstrating that these regions are pervasive across Nucleocytoviricota diversity. Giant virus genomic islands are frequently associated with genomic hypervariability and enriched in genes involved in host interaction, particularly surface adhesion proteins, suggesting key roles in host adaptation during virus-host arms race. Comparative analyses further reveal these islands to be major hotspots of genome diversification, exhibiting frequent gain, loss, and rearrangement even among highly similar genomes, with evidence that entire island regions can be exchanged among closely related viral populations. Notably, 37% of genomic islands are enriched in bacterial homologs, and several exhibit striking synteny with genomic regions recovered from environmentally co-occurring bacterial genomes, supporting large-scale genetic exchange between bacteria and giant viruses. Together, these findings identify genomic islands as pervasive and dynamic drivers of giant virus genome evolution, providing a mechanistic framework for genome plasticity, mosaicism, and adaptive potential of giant viruses.

14
Expanding the genetic code with diverse backbone structures across diverse sequence contexts

Piedrafita, C.; Dickson, A.; Richter, D.; Weber, C.; Elliott, T. S.; Liu, Z.; Zhang, F.; Li, Y.; Dunkelmann, D. L.; Morgan, T.; Liu, K. C.; Chin, J. W.

2026-04-17 synthetic biology 10.64898/2026.04.16.718949 medRxiv
Top 0.1%
27.5%
Show abstract

Expanding the genetic code to enable the selective and specific incorporation of non-canonical monomers (ncMs), beyond -L amino acids with variant sidechains, is a key outstanding challenge. Here we discover orthogonal aminoacyl-tRNA synthetases that selectively and specifically acylate their cognate orthogonal tRNA in vivo with eleven new ncMs spanning five different chemical classes: ,-disubstituted-amino acids, malonic acids, carboxylic acids, {beta}2-amino acids and N-cyclic amino acids. We demonstrate that co-translational incorporation of ,-disubstituted-amino acids, {beta}2-amino acids, {beta}3-amino acids and N-cyclic amino acids is strongly dependent on the codons either side of the codon used to direct ncM incorporation, with several ncMs incorporated at less than 1% of sequence contexts. We evolve orthogonal tRNAs that enable the incorporation of previously unincorporated ncMs, enable the incorporation of ncMs at >95% of sequence contexts and, increase the incorporation efficiency at challenging sequence contexts up to 40-fold. We demonstrate the encoded cellular synthesis of proteins and macrocycles containing ncMs and, explicitly demonstrate that our evolved tRNAs provide direct access to a wider range of genetically encoded macrocyclic sequences containing ncMs. Our results provide a foundation for composing, discovering and manufacturing proteins and peptides with functions augmented by ncMs.

15
Signal, noise, and bias in phylogenetic inference:potential and limits to the resolution of phylogenetic trees in the phylogenomic era

Dornburg, A.; Su, Z. T.; Jin, Y.; Fisk, N.; Townsend, J. P.

2026-04-01 evolutionary biology 10.64898/2026.03.30.714540 medRxiv
Top 0.1%
26.9%
Show abstract

Phylogenomic datasets assembled to resolve the Tree of Life now routinely span thousands of loci comprising millions of characters. Yet the persistence of incongruent topologies across such datasets reveals a fundamental truth of phylogenetics: not all data are equally informative. Here we derive analytical approaches that predict the relative impacts of phylogenetic signal, stochastic noise, and systematic bias on phylogenetic inference. We show that these three components exhibit divergent scaling properties with character sampling: signal and bias accumulate linearly, while noise accumulates nonlinearly with a concave trajectory. For some phylogenetic problems, substantial amounts of phylogenetic noise may eventually be overwhelmed by signal. For other phylogenetic problems--especially those involving deep divergences, short internodes, or constrained character-state space--the slope of signal accumulation can be so shallow that even signal from genome-scale data may never practically exceed noise. Moreover, linear accumulation of phylogenetic bias can in principle continuously overwhelm accumulation of signal at a lower slope with additional characters, regardless of dataset size. Applying our theory to empirical datasets, we show that anchored hybrid enrichment and ultraconserved element loci, like any loci, can exhibit signal that is overwhelmed by noise, and that character acquisition biases in some loci can further confound inference. Given the pervasive nature of incongruence in the phylogenomic era, our work provides a theoretical foundation for understanding the limits of inference, improving experimental design, and guiding efficient and accurate resolution of the Tree of Life.

16
Ancient DNA reveals that natural selection has upregulated the immune system over the last 10,000 years

Maravall-Lopez, J.; Truong, B.; Kerner, G.; Zhao, Y.; Hou, K.; Perry, A.; Akbari, A.; Reich, D. E.; Price, A. L.

2026-04-14 evolutionary biology 10.64898/2026.04.14.718409 medRxiv
Top 0.1%
26.3%
Show abstract

The specific mechanisms through which human biology and disease susceptibility evolved with major shifts in West Eurasian environments and societies over the last 10,000 years(1)--particularly rising infectious burden(2)--remain poorly characterized, despite ancient DNA studies(3-6) identifying hundreds of candidate loci under positive selection(6). Here, we identify specific immune diseases/traits, genes/variants, pathways, and tissues/cell types impacted by natural selection by systematically integrating variant-level selection statistics with genome-wide association study (GWAS), quantitative trait locus (QTL), and molecular bulk/single-cell and gene pathway data. Genome-wide, positively-selected alleles are associated with reduced susceptibility to infectious diseases like tuberculosis (TB), influenza, and intestinal infections; consistent with selection-signal enrichments in immune cells within barrier tissues such as the respiratory tract and gut mucosa. In contrast, positively-selected alleles increase risk of intestinal inflammatory disease and autoimmune hypothyroidism, supportive of a tradeoff between infection and immune-mediated pathology, and consistent with adaptive alleles being QTLs for genes upregulating inflammation and other host-defense pathways. We reveal many novel adaptive loci with convergent signals from selection, infectious disease GWAS and immune-gene QTLs (including at FUT6 for intestinal infections; at ASAP1 for TB; and at LYZ, an antimicrobial enzyme), fine-mapping selection onto likely causal variants. Surprisingly, adaptive alleles had a protective effect on allergic conditions like asthma and dermatitis, challenging a common view that these conditions arose through evolutionary mismatch of present-day hygienic contexts relative to past, pathogen-rich environments(7).

17
PALINCODE: Recording cell lineage with ternary palindromic CRISPR bits

Fathi, M.; Cook, A.; Meisam, B.; Curiel, T.; McKenna, A.

2026-04-19 genomics 10.64898/2026.04.16.718941 medRxiv
Top 0.1%
25.5%
Show abstract

Reconstructing complete and accurate lineage trees remains a long-standing challenge in biology. Here, we introduce PALINCODE (Palindromic Coding and Decoding), a system that utilizes ternary CRISPR bits (cBits) to stochastically write one of three possible states over time, permanently embedding lineage relationships in the genome. We demonstrate PALINCODEs lineage-recording potential through simulations and establish palindromic CRISPR editing in cell culture models. We show that truncated Cas9 guide sequences yield ternary outcomes at high efficiency when compared to conventional guides. Using PALINCODE, we derived lineage-recording cell lines with a theoretical coding capacity of up to 10^25 bits, enabling the generation of lineage trees 32 cell divisions deep in single-cell sequencing of 293T cells. Furthermore, we applied PALINCODE using an in vivo melanoma model to jointly read out lineage history and gene expression, enabling in vivo reconstruction of clonal evolution within tumor cell clonal populations. PALINCODE circumvents several limitations of prior CRISPR-based systems while increasing the information potential at individual CRISPR sites, creating a lineage-recording platform with higher density than many competing approaches.

18
Metabolic inequality in microbial communities

Mueller, E. A.; Lennon, J. T.

2026-04-17 ecology 10.64898/2026.04.14.718602 medRxiv
Top 0.1%
25.2%
Show abstract

How metabolic activity is distributed among individuals determines the scaling of cellular physiology to higher levels of biological organization. Yet the mechanisms that generate this heterogeneity and shape its distribution remain largely unresolved. We quantified single-cell metabolism in microbial communities spanning aquatic, terrestrial, and host-associated ecosystems. Across more than one million cells, metabolic activity followed a long-tailed distribution best described by a lognormal model, with a small subset of individuals contributing disproportionately to community metabolism. In some cases, the most active 20% of cells accounted for over 90% of metabolic output, although this pattern became less pronounced in more productive environments. To assess the consequences of metabolic inequality, we developed a model linking single-cell activity to community respiration. Because respiration responds nonlinearly to enzyme activity, variation among cells does not translate proportionally into ecosystem-level fluxes. As a result, ignoring metabolic heterogeneity can bias estimates of community respiration by up to 60%. Our findings reveal a general pattern of metabolic inequality across microbial communities in disparate habitats. Accounting for this structure is critical for understanding how microorganisms shape ecosystem processes and for improving predictions of large-scale biogeochemical dynamics. SignificanceInequality is a common feature of social, economic, and physical systems. It also arises in nature, where a small fraction of individuals accounts for an outsized share of biological output, including reproduction, immunity, and diversity. Here, we show that metabolic activity in microbial communities follows a characteristic long-tailed distribution that consistently emerges across diverse ecosystems, including lakes, soils, ocean plankton, marine sediments, and mammalian guts. Rather than a rich-get-richer dynamic, metabolism becomes more evenly distributed among individuals in more productive environments. An explicit representation of metabolic inequality can improve predictions of how microbial communities, and the processes they support, respond to environmental change.

19
The diploid reference genome of a human embryonic stem cell line

Pacar, I.; Ungaro, M. T.; Chen, Y.; Dallali, H.; Medico, J. A.; Hebbar, P.; Diekhaus, M.; Di Tommaso, E.; Geleta, M.; Chan, P. P.; Lowe, T. M.; Balacco, J.; Jain, N.; Ackerman, F.; Mochi, M.; Ioannidis, A. G.; Sawarkar, N.; Diaz, K.; Krishna Sudhakar, K.; Powell, J. E.; Jain, M.; Rosa, A.; Croft, G. F.; Tanzer, A.; Jarvis, E. D.; Formenti, G.; Salama, S. R.; Giunta, S.

2026-03-30 genomics 10.64898/2026.03.26.714432 medRxiv
Top 0.2%
23.2%
Show abstract

Advances in DNA sequencing and assembly technologies are spurring a shift from haploid reference genomes to sample-specific diploid assemblies. Here, we generated the first telomere-to-telomere (T2T) diploid reference for the widely used human embryonic stem cell (hESC) line, H9 (WAe009-A). This haplotype-resolved assembly is highly accurate with comprehensive annotation of genes, segmental duplications, methylation, and chromatin conformation. Pangenomic and phased-locus inference point to H9s mixed ancestry with a predominant European component. H9-specific genomic features include near-perfect telomeres [~]1.65-fold longer than other T2T assemblies, consistent with telomerase activity during pluripotency; chromosome 17 inversions that can predispose offspring to neurological syndromes; and expansions of ncRNA clusters, with overall genomic stability maintained despite extensive culturing. Mapping multi-omic datasets to the genome, we demonstrate the power of this resource for allele-specific, high-precision transcriptomic, genetic, and epigenetic analyses, with far-reaching implications for human development and disease.

20
Birth order and disease risk across the human phenome: evidence from 10 million siblings

Kramer, B. K.; Kushner, S. A.; Rzhetsky, A.

2026-03-27 epidemiology 10.64898/2026.03.26.26349438 medRxiv
Top 0.2%
22.8%
Show abstract

Birth order has been implicated in the etiology of individual diseases, but has never been systematically assessed at phenome-wide scale with large administrative claims data and complementary epidemiological designs. Here we use two complementary approaches: a between-family matched cohort of 1.6 million pairs and a within-family sibling comparison which includes 5.1 million families and 10.3 million individuals. These were both applied to 569 diseases defined by the ICD9-CM/ICD10-CM codes in the commercial claim data of Merative MarketScan. Of 418 diseases with adequate case counts, 150 show Bonferroni-significant birth-order associations. All odds ratios compare second-borns with first-borns, so OR < 1 indicates first-born excess. First-borns are at an excessive risk for neurodevelopmental conditions (autism OR = 0.74, ADHD OR = 0.93) and immune-allergic diseases consistent with the hygiene hypothesis (food allergy OR = 0.80, allergic rhinitis OR = 0.91), while second-borns are at an excessive risk for substance abuse (OR = 1.19) and gastrointestinal conditions. Between-family and within-family estimates agree in direction for 84.7% of significant diseases (Pearson r = 0.65), and results are robust to state fixed effects (r = 0.997) and full-sibling restriction. Prespecified validation controls were broadly consistent with expectations. These findings provide a comprehensive map of birth-order effects across the human disease phenome.