Back

GENETICS

Oxford University Press (OUP)

Preprints posted in the last 90 days, ranked by how well they match GENETICS's content profile, based on 189 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Parameterizing the genetic architecture under stabilizing selection

Lee, H.; Terhorst, J.

2026-03-27 genetics 10.64898/2026.03.27.714826 medRxiv
Top 0.1%
18.3%
Show abstract

Across many complex traits, genetic variants with larger effect sizes tend to occur at lower frequencies, which is often interpreted as a signature of stabilizing selection. In statistical genetics, the so-called -model captures this relationship by assuming that effect size variance is inversely proportional to heterozygosity raised to a power 0 [<=] [<=] 1. Although empirically useful, the -model is phenomenological rather than mechanistic and lacks a direct population-genetic interpretation. In this paper, we derive an alternative to the -model based on evolutionary theory. Our approach yields a linear mixed model in which the frequency dependence of effect size emerges naturally as a function of interpretable evolutionary quantities describing mutational variance, selection intensity, and coupling between the focal and selected traits. These quantities enter through two identifiable variance components that can be estimated by restricted maximum likelihood (REML). The resulting framework links a fitness-landscape model to standard mixed-model methodology, enabling both inference on evolutionary parameters and downstream prediction by best linear unbiased prediction (BLUP). In forward simulations, the model accurately recovers the focal-trait variance and generally improves genetic prediction relative to conventional -model baselines.

2
Beyond single-slope Mendelian randomization: structural representation of genetic heterogeneity in joint effect space

Hao, H.; Chen, D.; Qian, C.; Zhou, X.; Huang, H.; Zuo, J.; Wang, G.; Peng, X.; Liu, H.-X.

2026-03-14 genetic and genomic medicine 10.64898/2026.03.12.26348288 medRxiv
Top 0.1%
17.2%
Show abstract

Causal effects in complex traits are typically represented by a single linear slope. While conventional Mendelian randomization (MR) provides efficient scalar estimates, projection-based summaries do not explicitly capture structural organisation in joint effect space under genetic heterogeneity. We introduce MR-UBRA (Mendelian randomization-Unified Bayesian Risk Architecture), a probabilistic framework that decomposes instrumental variants into genetic risk fragments (GRFs) and quantifies extreme deviations using tail-risk metrics defined on the standardised residual magnitude |e|. MR-UBRA preserves the classical MR estimand while offering a structurally resolved representation of genetic heterogeneity. Across stroke subtypes, AF[->]CES, smoking[->]lung cancer, and BMI[->]T2D, effect-space distributions exhibit reproducible asymmetry, amplitude stratification, and multi-modal structure. MR-UBRA resolves component-level organisation, separating tail-dominant contributions from the causal core while maintaining consistency with the classical MR estimand. Simulations and boundary regimes demonstrate adaptive model complexity: MR-UBRA selects K>1 when multi-component structure is present and collapses to K=1 under homogeneous conditions, avoiding spurious stratification. These results support viewing causal effects in complex traits as structured distributions in joint effect space, enhancing causal representation without altering the MR estimand. Graphical AbstractMendelian randomization (MR) typically represents causal effects with a single linear slope. Under genetic heterogeneity, instrumental effects in joint ({beta}X, {beta}Y) space may exhibit multi-component structure and amplitude stratification that cannot be captured by a scalar summary. MR-UBRA fits a standard error-weighted mixture model to decompose instruments into genetic risk fragments (GRFs), estimates GRF-specific effects using posterior-weighted soft-IVW, and quantifies extreme deviations through tail-risk metrics (VaR/CVaR). Across empirical analyses and boundary regimes, MR-UBRA adapts model complexity (K) to structural signal, collapsing to K=1 under homogeneous conditions. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=144 SRC="FIGDIR/small/26348288v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@1627086org.highwire.dtl.DTLVardef@1c9982eorg.highwire.dtl.DTLVardef@262730org.highwire.dtl.DTLVardef@d6e551_HPS_FORMAT_FIGEXP M_FIG C_FIG

3
Ubiquitous functional synergy partially explains why most transcription factor binding is non-functional

Mateusiak, C.; Jia, E.; Plaggenberg, J. N.; Erdenebaatar, Z.; Wang, Y.; Shively, C.; Liao, G.; Mitra, R. D.; Brent, M. R.

2026-01-20 genomics 10.64898/2026.01.19.700460 medRxiv
Top 0.1%
15.1%
Show abstract

Most genes in whose promotor a transcription factor (TF) binds do not change in expression when the concentration of the TF is perturbed. No existing model can predict which bound promotors will respond and which will not. We hypothesized that a genes response to perturbation of a TF bound in its promotor can depend on which other TFs are bound there, a phenomenon we call functional synergy. This is distinct from cooperative binding, which is already accounted for in the binding location data. To investigate functional synergy, we created a comprehensive dataset on TF binding locations in yeast using a method that is orthogonal to chromatin immunoprecipitation. We then used mathematical modeling to identify high-confidence instances of functional synergy. We found that such synergies are surprisingly common. Responses to perturbations of 44 different TFs were modified by the presence of other TFs. 48 TFs served as modifiers, but some modified responses to many TFs. We conclude that (1) measuring the binding locations of a single TF will not, in general, reveal which genes the TF regulates, and (2) traditional networks linking TFs to their targets must be made substantially more expressive, allowing some TFs to modify the effects of others.

4
Elevated mutation in haploid yeast driven by translesion synthesis

Fredette-Roman, J.; Smith, D. R.; Omari, S. B.; Sharp, N.

2026-01-23 evolutionary biology 10.64898/2026.01.22.701062 medRxiv
Top 0.1%
12.9%
Show abstract

The impact of selection versus genetic drift on the evolution of mutation patterns is unclear. In Saccharomyces cerevisiae, which is predominantly diploid in nature, there is evidence that haploid cells have a higher mutation rate than diploids, suggesting that a haploid-specific mutator phenotype may have evolved due to the limited opportunity for selection to act on this rare cell type. Mutation in haploids was primarily elevated in late-replicating regions of the genome, implicating error-prone translesion synthesis (TLS) repair. Additional research has demonstrated that removing REV1, a gene responsible for initiating TLS, causes a reduction in haploid mutation rate. To assess whether the preferential use of this error-prone repair pathway by haploids explains the difference in genome-wide mutation patterns between cell types, we deleted REV1 in both diploid and haploid S. cerevisiae and estimated their mutation rates using a mutation accumulation experiment. Consistent with a previous study, we found a 50% higher single nucleotide mutation rate in REV1+ haploids than in REV1+ diploids. Deleting the REV1 gene caused this difference to vanish, with mutation rates in haploid and diploid rev1{Delta} lines converging on 2.4 x 10-10. Our results suggest that the mutagenic effect of translesion synthesis is much stronger in haploids, reflecting a limited opportunity for selection to act on mutation rates in rarer cells or smaller populations. We also find evidence that REV1 plays an important role in mitochondrial genome maintenance in both cell types.

5
Omitted familial extrinsic risk inflates inferred intrinsic lifespan heritability

Kornilov, S. A.

2026-04-06 genetics 10.64898/2026.04.02.716222 medRxiv
Top 0.1%
12.6%
Show abstract

Shenhar et al. (2026) report 50% "intrinsic" lifespan heritability after calibrating a one-component correlated-frailty survival model to Scandinavian twin lifespans. Their framework is mathematically coherent, but the intrinsic component is not identified if heritable, mortality-relevant extrinsic susceptibility is omitted at calibration. We show that one-component calibration absorbs omitted familial extrinsic structure into the intrinsic frailty scale parameter{sigma}{theta} , and that this variance absorption is visible through separate diagnostics (1) Variance absorption. Under misspecification,{sigma}{theta} is inflated by +22.1% (95% CI: 21.5-22.7%), corresponding to +49% inflation in [Formula]. Falconer h2 is downstream of calibration and inherits a +9.2 pp bias (95% CI: 8.7-9.7). The{sigma}{theta} inflation is model-general: +22% (GM), +18% (MGG), +14% (SR); any dependence summary that is strictly increasing in{sigma}{theta} inherits this inflation, so Falconer h2 is one affected downstream quantity among many (Corollary B3). (2) Structural fingerprint. In the joint twin survival surface S(t1, t2), misspecification produces systematic dependence errors (ISE 48x that of the recovery model). Conditional twin dependence is inflated at all ages, peaking at age 80 ({Delta}r = 0.048). (3) Specificity. The bias requires an omitted component that is both heritable and mortality-relevant. Three negative controls, a boundary check ({rho} = 0), and a two-component recovery refit ({sigma}{theta} restored to within -3.2%) establish specificity. ACE decomposition yields C {approx} 0 throughout: the omitted extrinsic component loads onto A (because it is shared 1.0/0.5 in MZ/DZ), so switching summary statistics does not restore identification. (4) Sensitivity and falsifiability. Over an empirically anchored regime ({sigma}{gamma} [isin] [0.30, 0.65],{rho} [isin] [0.20, 0.50]), Falconer bias ranges from +2.8 to +18.9 pp (mean 9 pp). If{rho} is sufficiently negative, the bias reverses sign in all three model families (Corollary B4). A full-likelihood robustness check shows that this upward pull is partly structural and partly estimator-specific: in the same misspecified one-component model, ML still inflates{sigma}{theta} (+3%), whereas matching only rMZ inflates it much more (+21%). These results do not resolve true intrinsic heritability but establish that Shenhars 50% estimate carries a structured, model-general upward bias originating in the fitted latent variance{sigma}{theta} .

6
Model selection in ADMIXTURE can be inconsistent: proof of the K=2 phenomenon

Do, D.; Terhorst, J.

2026-03-02 evolutionary biology 10.64898/2026.02.27.708651 medRxiv
Top 0.1%
12.3%
Show abstract

STRUCTURE and ADMIXTURE are two popular methods for detecting population structure in genetic data. They model observed genotypes as mixtures of latent ancestral populations, and the inferred admixture proportions can be used to visualize and summarize population structure. A key parameter in these models is the number of ancestral populations, K. Selecting K is a challenging problem. Perhaps the most widely used method is Evannos {Delta}K, which selects K based on the second-order change in log-likelihood as K increases. However, practitioners have often noted that {Delta}K often favors overly small K, frequently returning K = 2 even when more meaningful substructure is present. In this paper, we provide a theoretical explanation for this phenomenon: we prove that, under certain conditions, the {Delta}K method can be inconsistent, meaning that it can fail to identify the true number of populations even with infinite data.

7
The performance of genetic-constraint metrics varies significantly across the human noncoding genome

McHale, P.; Goldberg, M. E.; Quinlan, A. R.

2026-01-28 genomics 10.64898/2026.01.28.701168 medRxiv
Top 0.1%
11.9%
Show abstract

A longstanding goal in human genetics is to prioritize noncoding loci that, when disrupted, lead to developmental disorders and other Mendelian traits. In pursuit of this goal, multiple metrics have been developed to distinguish neutrally evolving sequences from those subjected to purifying selection. These metrics are commonly evaluated genome-wide, e.g., by computing a precision-recall curve on windows tiling the entire noncoding genome. Here, we identify parts of the noncoding genome where these metrics significantly underperform relative to their genome-wide performance due to "bias" in the underlying models of neutral genetic variation and/or a low "signal-to-noise ratio" in the genetic data. The most extreme effects are found for Gnocchi (Chen et al. 2024), the performance of which declines as GC content increases. We suggest annotating constraint scores of noncoding genomic intervals with robust measures of the bias of the corresponding model, allowing users to gauge confidence in those scores.

8
Allelic Association Analyses: Estimation Recommendations

Weir, B. S.; Goudet, J.

2026-01-30 genomics 10.64898/2026.01.26.701864 medRxiv
Top 0.1%
9.9%
Show abstract

We review the rich literature on the estimation of measures of inbreeding, relatedness and population structure, beginning with Sewall Wrights F-statistics and moving onto the descriptive statistics of Masatoshi Nei and Clark Cockerham. The current availability of genome-level single nucleotide variant data is allowing for sophisticated treatments of inferred identity by descent segments and inferred ancestral recombination graphs. Underlying such disparate methods is an emphasis of characterizing the descent status of alleles within and between individuals and populations and we have found allele-sharing statistics a convenient framework for examining the differences and similarities among different estimators. We have been able to resolve some long-standing reported differences among estimators, especially those involving the work of Nei. In the course of our algebraic and empirical treatment of descent measure estimation we have been able to formulate a set of five recommendations. Following the early work of Sewall Wright, we recommend 1. State that descent measures for pairs of alleles are relative to values in a reference set of allele pairs. With this view, we recommend 2. Use estimators that preserve descent measure rankings over different reference sets. Allele-sharing estimators satisfy this recommendation. Reducing genotypic data to allelic data has the benefit of reducing dimensionality, but we recommend 3. If genotypic data are available, avoid having to assume Hardy-Weinberg equilibrium by not reducing them to allelic data. Partly as a consequence of working with genotypic data, we recommend 4. Recognize that allele frequencies do not need to be estimated. Not estimating allele frequencies prevents the confounding of descent estimates for target pairs of alleles by the status of all pairs in a reference set. On the basis of both theoretical and empirical results, finally we recommend 5. Consider both inbreeding and kinship when estimating either one. It is difficult to envisage a natural population with relatedness but no inbreeding, or vice versa.

9
Environment-dependent and often antagonistic effects of dominance and epistasis on heterosis in crosses between natural populations

Rojas-Gutierrez, J. D.; Mantel, S. J.; Oakley, C. G.

2026-02-12 evolutionary biology 10.64898/2026.02.10.705147 medRxiv
Top 0.1%
8.0%
Show abstract

Genetic drift in natural populations reduces the efficacy of selection, promoting the fixation of deleterious recessive alleles with consequences for maladaptation and population persistence. Heterosis, or increased F1 fitness relative to the parental mean, has been proposed as a tool for investigating the role of drift on genetic variation in fitness, but its genetic basis and environmental dependence remain unclear in natural populations. We used heterozygous near-isogenic lines (NILs) derived from a cross between locally adapted Arabidopsis thaliana ecotypes to assess how specific genomic regions influence heterosis. Cumulative fitness, estimated as fruits per seedling, was evaluated in a greenhouse and two simulated native environments. F1s showed strong heterosis in the greenhouse and one simulated environment. Non-additive effects in heterozygous NILs were highly environment- and background-dependent, varying in magnitude and sign, and no NIL had consistently effects across environments. The relative fitness of NILs was not correlated with gene number or genomic load in the introgressed regions. Small heterozygous regions often had large effects, indicating that complementation of mildly deleterious alleles alone does not fully explain heterosis and suggesting that overdominance or pseudo-overdominance may play a role. Evidence of epistasis was also observed, including outbreeding depression in some NILs, likely due to negative additive-by-dominance interactions. Summed effects of NILs often exceeded the fitness increase of the F1 suggesting dominance-by-dominance epistasis, but the direction of these epistatic effects depended on both genetic background and environment. Our results demonstrate that F1 fitness reflects both positive dominance and different epistatic interactions that are environment- and background-dependent.

10
Altering dosage of meiotic crossover-associated RING finger proteins affects crossover number and interference in Drosophila

Frantz, E.; Santa Rosa, P.; McMahan, S.; Sekelsky, J.

2026-02-19 genetics 10.64898/2026.02.18.706578 medRxiv
Top 0.1%
7.2%
Show abstract

Crossovers play a critical role in ensuring correct reductional segregation of homologous chromosomes in the first meiotic division. Crossing over is initiated by formation of DNA double-strand breaks (DSBs), but the number of DSBs is greater than the number of crossovers. Which recombination sites become crossovers, versus being repaired as non-crossovers, is not random, but is subject to several crossover patterning phenomena, including crossover assurance and crossover interference. One current model for crossover designation proposes that crossover-associated RING finger proteins (CORs) undergo the biophysical process of coarsening, in which larger accumulations continue to get larger and smaller accumulations go away. Genetic and cytological studies of the three CORs in Drosophila melanogaster, Vilya, Narya, and Nenya, are consistent with this model. In females heterozygous for a deletion of vilya, fewer doublecrossovers are observed. Conversely, crossovers are elevated in females carrying a duplication of vilya and in females coordinately overexpressing Vilya, Narya, and Nenya. These findings support a model in which crossover designation occurs through coarsening of COR proteins within the synaptonemal complex.

11
Laboratory yeast crosses reveal limited epistasis in the genetic basis of complex traits

Gupta, M.; Holmes, C. M.; Belousova, J.; Gopalakrishnan, S.; Rego-Costa, A.; Desai, M. M.

2026-04-06 genetics 10.64898/2026.04.04.716439 medRxiv
Top 0.1%
6.5%
Show abstract

Mapping the genetic basis of complex traits is complicated by the presence of epistatic interactions between loci. While work in molecular genetics identifies numerous specific genetic interactions, statistical analyses of quantitative traits frequently conclude that additive (nonepistatic) models explain most heritable variation. However, these conclusions are typically limited by the narrow range of genetic relatedness(e.g. in F1 offspring of a biparental or circular cross). Here, we use a barcoded panel of Saccharomyces cerevisiae genotypes with a broad range of relatedness to quantify the effects of epistasis on the genetic architecture of seven complex traits. We find limited contributions of epistasis to the genetic basis of these traits. These results indicate that epistasis beyond that detected in standard yeast crosses may exist, yet it contributes little to phenotypic variance in these systems.

12
Epistatic fitness landscapes emerge from parallel adaptive walks in breeding network metapopulations

Monyak, T.; Morris, G.

2026-03-20 genetics 10.64898/2026.03.18.712732 medRxiv
Top 0.1%
6.3%
Show abstract

Global networks of crop breeding programs leverage diverse germplasm, but diversity increases the complexity of maintaining stability in their elite genepools. To characterize genetic heterogeneity in breeding metapopulations and develop insights on how to manage it, we simulated the evolution of breeding populations on fitness landscapes. We revealed the geometric decrease in the average effect size of alleles segregating as standing variation that become fixed along an adaptive walk. We also demonstrated how independent adaptive walks of subpopulations are influenced by genetic drift, leading to cryptic genetic heterogeneity among elite genepools. This variation is released when elite lines derived from independent subpopulations are crossed, leading to segregation for 2-4X more major QTL in admixed families as in unadmixed families, and 2-4X more epistatic interactions. The emergent property of fitness epistasis for traits under stabilizing selection is well-understood in evolutionary genetics, but under-appreciated in crop quantitative genetics. To highlight the importance of this phenomenon, we constructed an empirical genotype-to-fitness landscape from the sorghum NAM, a global admixed prebreeding resource, demonstrating the utility of fitness landscapes for inferring genetic compatibilities within metapopulations. Our findings suggest that in breeding networks, strategies for effective germplasm exchange must account for epistasis in the oligogenic component of the genetic architecture of locally-adapted traits. Article summaryModern public sector crop improvement happens in networks of breeding programs that routinely exchange genetic information. Traditional models for understanding quantitative traits have limited predictiveness in situations with such genetic heterogeneity. This study uses breeding simulations and empirical data to show the utility of the fitness landscape framework for characterizing the genetic architecture of complex traits in breeding metapopulations. By simulating the evolution of breeding programs and integration into networks, it demonstrates how epistatic interactions between large-effect alleles are a fundamental property that must be accounted for when exchanging germplasm. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=102 SRC="FIGDIR/small/712732v1_ufig1.gif" ALT="Figure 1"> View larger version (25K): org.highwire.dtl.DTLVardef@1541326org.highwire.dtl.DTLVardef@b553a8org.highwire.dtl.DTLVardef@8758b4org.highwire.dtl.DTLVardef@1d0bdcd_HPS_FORMAT_FIGEXP M_FIG C_FIG

13
The Age of Selection-Duality Mutation under Fluctuating Selection among Individuals (FSI)

Gu, X.

2026-02-02 evolutionary biology 10.64898/2026.01.30.701161 medRxiv
Top 0.1%
6.1%
Show abstract

Our recent work on molecular evolution and population genetics postulated that individuals with a specific mutation exhibit a fluctuation in fitness, short for FSI (fluctuating selection among individuals), whereas the fitness effect of wildtype remains a constant. An intriguing phenomenon called selection-duality emerges, that is, a slightly beneficial mutation could be a negative selection (the substitution rate less than the mutation rate). It appears that selection-duality is bounded by two bounds: the generic neutrality where the mutation is neutral by the means of fitness on average, and the substitution neutrality where the substitution rate equals to the mutation rate. In addition, the middle point of generic neutrality and substitution neutrality is called the FSI-neutrality. An important problem is about the age profile of allele frequency, i.e., the arising timing of a mutation whose frequency in the current population is given (the allele-age problem for short). Solving this problem under selection duality would help extend the standard coalescent theory that based on strict neutrality to a more general form under selection duality. In this paper, we studied the allele-age problem under selection-duality by the first arrival time approach and the mean age approach, respectively. Since the general solution of allele-age problem under selection duality is not available, we focused on solving the problem at the substitution neutrality (the up-bound of selection duality), the FSI-neutrality (the middle-point) and the generic neutrality (the low-bound), respectively. Our analysis results in an overall picture that the mean first-arrival age of a mutation at the substitution neutrality is theoretically identical to that at the FSI-neutrality, which is numerically close to that at the generic neutrality. For illustration, we calculated the mean age of nonsynonymous mutations in the human population and demonstrated that the estimated allele-age could be overestimated considerably when the effect of FSI was neglected.

14
The genetic architecture of local adaptation is historically contingent

Duan, T.; Whitlock, M. C.; Booker, T. R.

2026-02-03 genetics 10.64898/2026.02.01.703099 medRxiv
Top 0.1%
6.1%
Show abstract

Revealing the genetic basis of local adaptation is a common goal of evolutionary biology, but despite theoretical progress, general expectations for the genetic architecture of local adaptation are still unclear. Theoretical analyses usually model simplified ecologies or simplified genetic architectures of adaptive traits, so the interplay of these factors is missing from our understanding. In this study, we use simulations to explore how the interplay of ecological and genetic parameters influences the evolution and genetic architecture of local adaptation. With these simulations, we ask: i) What are the features of alleles that made the largest contribution to local adaptation, and how are they affected by polygenicity of adaptive traits, migration rates, demographic history, and the spatial pattern of the environment? And ii) does allele age moderate the confounding effect from population structure in genotype-environmental associations (GEA)? We find that the frequency, number, and phenotypic effect size of locally adaptive alleles are sensitive to trait polygenicity and demographic history, and that these factors shape the evolutionary dynamics of local adaptation. We find that population expansions can leave legacies in the genetic architecture of local adaptation, reducing the expected number of adaptive alleles relative to models with constant population size, and this effect is long-lasting. Compared to range expansion, other ecological variables known to affect the genetic basis of local adaptation had limited effects. Finally, allele age moderated the confounding effect of population structure and modified the causal effect of environmental variables on genotypes. Alleles that arose around the time of environmental changes often made large contributions to local adaptation, but young alleles often had the highest false positive rates and were the most common age category. We describe how incorporating allele age and its interactions with population structure and environmental variables may increase the sensitivity and specificity of GEA analysis. Overall, this work demonstrates the critical importance that a species demographic history can have on its genetic architecture of local adaptation.

15
Seasonal fluctuations in fitness result in severe reductions in effective population size

Johnson, O. L.; Tobler, R.; Schmidt, J. M.; Huber, C. D.

2026-04-01 evolutionary biology 10.64898/2026.03.30.715388 medRxiv
Top 0.1%
6.0%
Show abstract

Genetic evidence for fluctuating selection has begun to accumulate for different species over the past few decades, especially for the Drosophila genus where studies have reported hundreds of loci undergoing putatively adaptive oscillations across successive seasons. However, most theoretical and simulation studies of fluctuating selection have relied on abstract or weakly parameterized models, making it difficult to assess their relevance for natural populations. In this study, we simulate multilocus seasonally fluctuating selection under a recently developed model and examine its effect on the variance effective population size (Ne) at a genome-wide scale. By recapitulating genomic, demographic, and evolutionary parameters from natural Drosophila populations in our simulations, we were able to reproduce allele frequency oscillations reported in recent studies and show that these lead to [~]50% genome-wide reductions in Ne. We also demonstrate that Ne reductions are well predicted by the maximum frequency amplitude among all adaptively fluctuating loci, and that the frequency amplitudes are largely determined by the number of adaptively fluctuating loci and the strength of their epistatic interactions. Our results demonstrate that fluctuating selection can substantially reduce effective population size and underscore the importance of temporally variable selection in shaping genome-wide patterns of variation beyond classical models. Article SummaryGenetic studies of fluctuating selection in natural populations have grown steadily over the past decade, with reports suggesting that hundreds of loci undergo adaptive oscillations over seasonal timescales in cosmopolitan Drosophila populations. By simulating seasonally fluctuating selection under a recently developed model and ecological scenarios informed by published studies, the authors show that this mode of selection can reduce effective population size by [~]50%, with the magnitude of the reduction correlated with the locus exhibiting the largest allele frequency fluctuations. These findings highlight fluctuating selection as an important factor shaping genome-wide patterns of genetic variation and effective population size.

16
Pitfalls in estimating and interpreting the contribution of ultra-rare genetic variants to the heritability of complex traits

Wang, H.; Wainschtein, P.; Sidorenko, J.; Fikere, M.; Zhang, Y.; Kemper, K. E.; Zheng, Z.; Hivert, V.; Zeng, J.; Goddard, M. E.; Visscher, P. M.; Yengo, L.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350278 medRxiv
Top 0.1%
5.0%
Show abstract

Assessing the contribution of ultra-rare variants (minor allele frequency <0.01%) to the heritability of complex traits remains challenging due to limited understanding of potential biases. Here, we focus on singletons (that is, variants observed only once in the study sample), the most abundant class of ultra-rare variants, to showcase various confounders of heritability estimates and underline pitfalls in their interpretation. We show through theory, simulations, and analysis of 5,330,210 exome-sequenced singletons in 305,813 unrelated European-ancestry individuals in the UK Biobank that (i) population stratification induces both upward and downward biases in singleton-based heritability estimates (), (ii) estimates capture non-additive genetic effects, and (iii) asymptotic standard errors of estimates from likelihood-based procedures are generally mis-calibrated when traits are not normally distributed. We further showcase these biases in real-data analyses of 22 quantitative phenotypes and report, after accounting for these pitfalls, significant estimate for number of children (3.4%), peak expiratory flow (1.9%), red blood cell count (2.5%), white blood cell count (1.9%) and heel bone mineral density (2.4%). Overall, our study provides recommendations for robust inference of heritability from ultra rare variants and underscores that reliable estimates for ordinal and binary traits will require far larger sample sizes and improved methods, given that confounding in these traits remains difficult to detect and correct

17
Measurement strategy alters inferred age-dependent accumulation and mortality risk of mosaic Y loss

Ware, A.; Weyrich, M.; Fatima, S.; Xu, T.; Radhakrishnan, S.; Kapfer, P.; Yang, X.; Schiethe, L.; Zanders, L.; Cremer, S.; Mas-Peiro, S.; Dimmeler, S.; Speer, T.; Zeiher, A.; Abplanalp, W.

2026-03-10 health informatics 10.64898/2026.03.09.26347951 medRxiv
Top 0.1%
4.9%
Show abstract

Mosaic loss of Y chromosome (mLOY) is a widely used biomarker of biological aging, yet whether its inferred age-dependent accumulation and associated clinical risk are invariant to measurement strategy remains unclear. We compared intensity-based and phase-based quantification approaches in 223,251 men from the UK Biobank to determine how analytic definitions influence estimates of mLOY burden, risk thresholds and population prevalence. Phase-based quantification revealed a steeper and more stable age-dependent accumulation of mLOY and identified excess mortality risk at lower mosaic burdens than intensity-based metrics. These differences shifted the inferred onset of biological risk and expanded the proportion of individuals classified as affected from 5.3% to 19.2%. Conventional thresholding preferentially excluded low-burden mosaicism, compressing risk gradients and reducing statistical resolution for downstream associations. These findings show that analytic definitions materially alter inferred accumulation dynamics, risk thresholds and population prevalence of mosaic Y loss.

18
Mount Fuji's stubby peak: the genotypic density of additive landscapes near maximal fitness

Kinney, J. B.

2026-04-06 evolutionary biology 10.64898/2026.04.02.716185 medRxiv
Top 0.1%
4.9%
Show abstract

Additive fitness landscapes--also called Mount Fuji landscapes--are the simplest and most widely used models of sequence-function relationships. As such, they play essential roles across multiple areas of biology, including evolutionary theory, quantitative genetics, gene regulation, and protein science. One of the most basic properties of any fitness landscape is its genotypic density--the number of sequences near a given fitness value. Understanding this density is especially important near fitness peaks, as it quantifies the supply of high-fitness genotypes. Here I study the genotypic density of additive landscapes near fitness peaks. Although this density is well known to be approximately Gaussian near the middle of the fitness range, its behavior near maximal fitness has not been reported. I begin by deriving a saddle-point approximation that accurately describes the genotypic density of additive landscapes over virtually the entire fitness range. I then show that the log density follows a power law near maximal fitness, with an exponent determined by how much the best allele at each position outperforms its nearest competitor. This power-law behavior holds over a substantial fraction of fitness values, besting the Gaussian approximation on both simulated and empirical landscapes across roughly a quarter to a third of the fitness range. Under certain conditions this behavior also extends to globally epistatic landscapes (defined as nonlinear functions over one or more additive traits), though with a reduced range of validity. These findings advance our understanding of one of the most fundamental models of sequence-function relationships. In particular, they reveal that the uppermost reaches of Mount Fuji landscapes, rather than being sharply peaked, are actually quite stubby.

19
WINDEX: A hierarchical integration of site- and window-based statistics for characterizing the footprint of positive selection in genome-wide population genetic data

Snell, H.; McCallum, S.; Raghavan, D.; Singh, R.; Ramachandran, S.; Sugden, L.

2026-03-26 evolutionary biology 10.64898/2026.03.26.714384 medRxiv
Top 0.1%
4.8%
Show abstract

Adaptive mutations, or mutations that confer a fitness benefit, can leave behind distinct signals in genetic data. Computational methods have improved the localization of adaptive mutations in genetic samples using a range of statistical and machine learning classification techniques. However, these methods miss the opportunity to jointly integrate statistics at both the site and window-based level, thus failing to harness all available statistical evidence to detect selection. Our method, WINDEX, combines these different resolutions of statistics to improve the detection of adaptive mutations among hitchhiking signals. Our model simultaneously integrates emissions at different resolutions by defining site-based and window-based latent states corresponding to neutral, linked, and sweep regions, with the site-based states and transition models nested within the window-based states. Using evolutionary simulations with varying selection parameters, we validate the ability of WINDEX to classify positive selective sweeps. Using data from the 1000 Genomes Project, we show that WINDEX is able to identify regions harboring signals of selective sweeps, and provides improved localization within those regions over existing methods. In addition, using WINDEX genome-wide allows for estimation of the proportion of whole genomes that are under positive selective pressures; our estimates of between 9.7-10.5% across different populations provide support for other preliminary estimates of these quantities. Author summaryPopulation geneticists often seek evidence for positive selective sweeps, or an evolutionary event in which a beneficial allele increases in frequency over time in a population, resulting in increased fitness of the individuals that have said allele. Positive selective sweeps, however, are difficult to detect due to varying patterns of linkage disequilibrium (LD), or the nonrandom association of alleles, and detecting these signals reliably among differing LD structures remains a challenge in the field. In this work, we present WINDEX, a probabilistic framework designed to leverage signals of positive selective sweeps at both the site- and window-levels in the form of a hierarchical hidden Markov model (HHMM), to localize regions of positive selective sweeps in aligned haplotype data. We validate WINDEX in evolutionary simulations over varying positive selective sweep scenarios, showcasing the improved resolution that the HHMM structure provides. We apply WINDEX in comparative genomic scans of canonical sites of positive selection as well as whole-genome scans to demonstrate the tools power in localizing functionally-validated signals of selection and to offer insights into the proportion of the human genome currently under positive selective pressures. WINDEX is publicly available and easy to apply to many cases of human genetic data.

20
General moment closure for the neutral two-locus Wright-Fisher dynamics

Kundagrami, R.; Yetter, S.; Steinruecken, M.

2026-01-20 genetics 10.64898/2026.01.16.700021 medRxiv
Top 0.1%
4.8%
Show abstract

The Wright-Fisher diffusion and its dual, the coalescent process, are at the core of many results and methods in population genetics. Approaches have been developed to study the dynamics of its moments under genetic drift, mutation, and recombination using ordinary differential equations. The dynamics of these moments can be used to study population genetic processes and are key building blocks of efficient methods to infer population genetic parameters, like demographic histories or fine-scale recombination rates. However, the system of equations does not close under recombination; that is, computing moments of a certain order requires knowledge of moments of higher order. By applying a coordinate transformation to the diffusion generator, we show that the canonical moments in these alternative coordinates yield a closed system, enabling more accurate numerical computations. Compared to previous approaches in the literature, we believe that this approach can be more readily extended to general scenarios. Through simulations, we verify that the derived system of differential equations can accurately capture the dynamics of the moments, and can be used to efficiently compute expected diversity and linkage statistics in population genetic samples.