GENETICS
◐ Oxford University Press (OUP)
All preprints, ranked by how well they match GENETICS's content profile, based on 189 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Shen, X.; Song, S.; Li, C.; Zhang, J.
Show abstract
We recently measured the fitness effects of a large number of coding mutations in yeast under four laboratory conditions, finding that most synonymous mutations are strongly deleterious although they are overall significantly less detrimental than nonsynonymous mutations. Kruglyak et al. believe that most nonsynonymous and nearly all synonymous mutations have no detectable fitness effects, so hypothesize that our results largely reflect the fitness effects of CRISPR/Cas9 off-target edits and secondary mutations that occurred in mutant construction. Dhindsa et al. argue that our findings contradict other yeast and human mutagenesis studies, human allele frequency distributions, and disease gene mapping results. We find Kruglyak et al.s hypothesis unsupported by prior yeast genome editing studies and mutation rate estimates. Furthermore, their hypothesis makes a series of predictions that are falsified by our published and newly collected data. Hence, their hypothesis cannot explain our observations. Dhindsa et al.s comparisons between synonymous and nonsynonymous mutations in prior mutagenesis studies and in contributions to disease are unfair and human allele frequency distributions can be compatible with our fitness estimates when multiple complicating factors are considered. While our fitness estimates of yeast synonymous mutants overturn the (nearly) neutral assumption of synonymous mutations, they are not inconsistent with various existing data.
Park, Y.; Metzger, B. P. H.; Thornton, J. W.
Show abstract
We recently reanalyzed 20 combinatorial mutagenesis datasets using a novel reference-free analysis (RFA) method and showed that high-order epistasis contributes negligibly to protein sequence-function relationships in every case. Dupic, Phillips, and Desai (DPD) commented on a preprint of our work. In our published paper, we addressed all the major issues they raised, but we respond directly to them here. 1) DPDs claim that RFA is equivalent to estimating reference-based analysis (RBA) models by regression neglects fundamental differences in how the two formalisms dissect the causal relationship between sequence and function. It also misinterprets the observation that using regression to estimate any truncated model of genetic architecture will always yield the same predicted phenotypes and variance partition; the resulting estimates correspond to those of the RFA formalism but are inaccurate representations of the true RBA model. 2) DPDs claim that high-order epistasis is widespread and significant while somehow explaining little phenotypic variance is an artifact of two strong biases in the use of regression to estimate RBA models: this procedure underestimates the phenotypic variance explained by RBA epistatic terms while at the same time inflating the magnitude of individual terms. 3) DPD erroneously claim that RFA is "exactly equivalent" to Fourier analysis (FA) and background-averaged analysis (BA). This error arises because DPD used an incorrect mathematical definition of RFA and were misled by a simple numerical relationship among the models that only holds only for the simplest kinds of datasets. 4) DPD argue that using a nonlinear transformation to account for global nonlinearities in sequence-function relationships is often unnecessary and may artifactually absorb specific epistatic interactions. We show that nonspecific epistasis caused by a limited dynamic range affects datasets of all types, even when the phenotype is represented on a free-energy scale. Moreover, using a nonlinear transformation in a joint fitting procedure does not underestimate specific epistasis under realistic conditions, even if the data are not affected by nonspecific epistasis. The conclusions of our work therefore hold: the genetic architecture of all 20 protein datasets we analyzed can be efficiently and accurately described in an RFA framework by first-order amino acid effects and pairwise interactions with a simple model of global nonlinearity. We are grateful for DPDs commentary, which helped us improve our paper.
Tashman, K.; Cui, R.; O'Connor, L. J.; Neale, B.; Finucane, H. K.
Show abstract
S-LDSC is a widely used heritability enrichment method that has helped gain biological insights into numerous complex traits. It has primarily been used to analyze large annotations that contain approximately 0.5% of SNPs or more. Here, we show in simulation that, when applied to small annotations, the block jackknife-based significance testing used in S-LDSC does not always control type 1 error. We show that the inflation of type 1 error for small annotations is due both to the noisiness of the jackknife estimate of the standard error and to the non-normality of the regression coefficient estimates. We use the percent of 0.01 centimorgan blocks in the genome overlapped by the annotation to quantify the size of an annotation and the extent to which the SNPs in the annotation cluster together, and we find thresholds on this value above which type 1 error is controlled. We have implemented a test in the LDSC software that informs users when they compute LD scores for an annotation if the annotation does not pass the threshold for producing controlled type 1 error. Author SummaryGenetics is a rapidly evolving field that allows us to link our genetic code to the physiological manifestations of disease. A key part of this work is finding regions of the genome that contribute disproportionately to the genetic underpinnings of a disease. A commonly used tool to provide such insight is stratified LD score regression (S-LDSC). S-LDSC allows us to estimate how much a set of genomic regions contributes to the overall heritability of a phenotype, and to test whether this is more than we would expect by chance. Here we show that when we apply S-LDSC to a small set of genomic regions, it does not give an accurate test of whether this set of genomic regions contributes more than we would expect by chance to the phenotype. We characterize what it means to be a "small" set of genomic regions, and we set thresholds to restrict which annotations we test to prevent false positive results.This helps to ensure that as we continue to pursue genetic analyses at scale, we report only truly significant results that will help us further understand the etiology of many of the traits we study.
Schraiber, J. G.; Edge, M. D.
Show abstract
Without the ability to control or randomize environments (or genotypes), it is difficult to determine the degree to which observed phenotypic differences between two groups of individuals are due to genetic vs. environmental differences. However, some have suggested that these concerns may be limited to pathological cases, and methods have appeared that seem to give--directly or indirectly--some support to claims that aggregate heritable variation within groups can be related to heritable variation among groups. We consider three families of approaches: the "between-group heritability" sometimes invoked in behavior genetics, the statistic PST used in empirical work in evolutionary quantitative genetics, and methods based on variation in ancestry in an admixed population, used in anthropological and statistical genetics. We take up these examples to show mathematically that information on within-group genetic and phenotypic information in the aggregate cannot separate among-group differences into genetic and environmental components, and we provide simulation results that support our claims. We discuss these results in terms of the long-running debate on this topic.
Ellis, T. J.
Show abstract
Pleiotropy is when a single locus affects two or more traits. The magnitude and direction of pleiotropy can constrain or faciliate phenotypic evolution. Investigations of pleiotropy have typically relied on null-hypothesis tests to classify cases into discrete categories based on the direction of effects. This discrete approach ignores the quantitative nature of pleiotropy, and systematically underestimates pleiotropic interactions. I describe a simple method to quantify the direction and magnitude of pleiotropic effects to alleviate these issues for pairs of traits. I illustrate how genotype-by-environment interactions can be viewed as a special case of pleiotropy and described in the same way. I provide an R package, psiotropy, to apply these methods.
Saitou, M.; Dahl, A.; Wang, Q.; Liu, X.
Show abstract
Genome-wide association studies (GWAS) are overwhelmingly biased toward European ancestries. Nearly all existing studies agree that transferring genetic predictions from European ancestries to other populations results in a substantial loss of accuracy. This is commonly referred to as low portability of polygenic risk scores (PRS) and is one of the most important barriers to the ethical clinical deployment of PRS. Yet, it remains unclear how much various genetic factors, such as linkage disequilibrium (LD) differences, allele frequency differences or causal effect differences, contribute to low PRS portability. In this study, we used gene expression levels in lymphoblastoid cell lines (LCLs) as a simplified model of complex traits with minimal environmental variation, in order to understand how much each genetic factor contributes to PRS portability from European to African populations. We found that cis-genetic effects on gene expression are highly similar between European and African individuals ([Formula]). This stands in stark contrast to the very low estimates of cis-genetic correlation between Europeans and Africans in previous studies, which we demonstrate are artifacts of statistical bias. We showed that portability decreases with increasing LD differences in the cis-regions. We also found that allele frequency differences of causal variants have a striking impact on PRS portability. For example, PRS portability is reduced by more than 32% when the causal cis-variant is common (minor allele frequency, MAF > 5%) in European samples (training population) but is rarer (MAF < 5%) in African samples (prediction population). While large allele frequency differences can decrease PRS portability through increasing LD differences, we also show that causal allele frequency can significantly impact portability independently of LD. This observation suggests that improving statistical fine-mapping alone does not overcome the loss of portability caused by causal allele frequency differences. Lastly, we also found that causal allele frequency is the main genetic factor underlying differential gene expression levels across ancestries. We conclude that causal genetic effects are highly similar in Europeans and Africans, and low PRS portability is primarily due to allele frequency differences.
Fletcher, J.; Wu, Y.; Li, T.; Lu, Q.
Show abstract
Researchers often claim that sibling analysis can be used to separate causal genetic effects from the assortment of biases that contaminate most downstream genetic studies. Indeed, typical results from sibling models show large (>50%) attenuations in the associations between polygenic scores and phenotypes compared to non-sibling models, consistent with researchers expectations about bias reduction. This paper explores these expectations by using family (quad) data and simulations that include indirect genetic effect processes and evaluates the ability of sibling models to uncover direct genetic effects. We find that sibling models, in general, fail to uncover direct genetic effects; indeed, these models have both upward and downward biases that are difficult to sign in typical data. When genetic nurture effects exist, sibling models create "measurement error" that attenuate associations between polygenic scores and phenotypes. As the correlation between direct and indirect effect changes, this bias can increase or decrease. Our findings suggest that interpreting results from sibling analysis aimed at uncovering direct genetic effects should be treated with caution.
Zhang, W.; Reeves, R. G.; Tautz, D.
Show abstract
It has been proposed that many loci with no significant association in GWA studies can nonetheless contribute to the phenotype through modifier interactions with the core genes, implying a polygenic determination of quantitative traits. We have tested this hypothesis by using Drosophila pupal phenotypes. We identified candidate genes for pupal length determination in a GWA and show for disrupted versions of the genes that most are indeed involved in the phenotype, presumably forming a core pathway. We then randomly chose genes below the GWA threshold and found that three quarters of them had also an effect on the trait. We further tested the effects of these knockout lines on an independent behavioral pupal trait (pupation site choice) and found that a similar, but non-correlated fraction of them had a significant effect as well. Our data thus confirm the prediction that a large number of genes can influence independent quantitative traits. Impact statementQuantitative traits are similarly likely influenced by randomly picked loci as by loci identified in a genome-wide association study.
Chikhi, L.; Rodriguez, W.; Paris, C.; Ha-Shan, M.; Jouniaux, A.; Arredondo, A.; Nous, C.; Grusea, S.; Corujo, J.; Lourenco, I.; Boitard, S.; Mazet, O.
Show abstract
Reconstructing the demographic history of populations and species is one of the greatest challenges facing population geneticists. [50] introduced, for a sample of size k = 2 haploid genomes, a time- and sample-dependent parameter which they called the IICR (inverse instantaneous coalescence rate). Here we extend their work to larger sample sizes and focus on Tk, the time to the first coalescence event in a haploid sample of size k where k [≥] 2. We define the IICRk as the Inverse Instantaneous Coalescence Rate among k lineages. We show that (i) under a panmictic population [Formula] is equivalent to Ne, (ii) the IICRk can be obtained by either simulating Tk values or by using the Q-matrix approach of [61] and we provide the corresponding Python and R scripts. We then study the properties of the [Formula] under a limited set of n-island and stepping-stone models. We show that (iii) in structured models the [Formula] is dependent on the sample size and on the sampling scheme, even when the genomes are sampled in the same deme. For instance, we find that [Formula] plots for individuals sampled in the same deme will be shifted towards recent times with a lower plateau as k increases. We thus show that (iv) the [Formula] cannot be used to represent "the demographic history" in a general sense, (v) the [Formula] can be estimated from real or simulated genomic data using the PSMC/MSMC methods [44, 65] (vi) the MSMC2 method produces smoother curves that infer something that is not the [Formula], but are close to the [Formula] in the recent past when all samples are obtained from the same deme. Altogether we argue that the PSMC, MSMC and MSMC2 plots are not expected to be identical even when the genomes are sampled from the same deme, that none can be said to represent the "demographic history of populations" and that they should be interpreted with care. We suggest that the PSMC, MSMC and MSMC2 could be used together with the [Formula] to identify the signature of population structure, and to develop new strategies for model choice.
Platt, A.; Harris, D. N.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWThe observation that even a tiny sample of genome sequences from a natural population contains a plethora of information about the history of the population has enticed researchers to use these data to fit complex demographic histories and make detailed inference about the changes a population has experienced through time. Unfortunately, the standard assumptions required to make these inferences are often violated by natural populations in such ways as to produce specious results. This paper examines two phenomena of particular concern: when a sample is drawn from a single sub-population of a larger meta-population these models infer a spurious recent population decline, and when a genome contains loci under weak or recessive purifying selection these models infer a spurious recent population expansion.
Ochoa, A.; Storey, J. D.
Show abstract
FST is a fundamental measure of genetic differentiation and population structure, currently defined for subdivided populations. FST in practice typically assumes independent, non-overlapping subpopulations, which all split simultaneously from their last common ancestral population so that genetic drift in each subpopulation is probabilistically independent of the other subpopulations. We introduce a generalized FST definition for arbitrary population structures, where individuals may be related in arbitrary ways, allowing for arbitrary probabilistic dependence among individuals. Our definitions are built on identity-by-descent (IBD) probabilities that relate individuals through inbreeding and kinship coefficients. We generalize FST as the mean inbreeding coefficient of the individuals local populations relative to their last common ancestral population. We show that the generalized definition agrees with Wrights original and the independent subpopulation definitions as special cases. We define a novel coancestry model based on \"individual-specific allele frequencies\" and prove that its parameters correspond to probabilistic kinship coefficients. Lastly, we extend the Pritchard-Stephens-Donnelly admixture model in the context of our coancestry model and calculate its FST. To motivate this work, we include a summary of analyses we have carried out in follow-up papers, where our new approach has been applied to simulations and global human data, showcasing the complexity of human population structure, demonstrating our success in estimating kinship and FST, and the shortcomings of existing approaches. The probabilistic framework we introduce here provides a theoretical foundation that extends FST in terms of inbreeding and kinship coefficients to arbitrary population structures, paving the way for new estimators and novel analyses.\n\nNote: This article is Part I of two-part manuscripts. We refer to these in the text as Part I and Part II, respectively.\n\nPart I: Alejandro Ochoa and John D. Storey. \"FST and kinship for arbitrary population structures I: Generalized definitions\". bioRxiv (10.1101/083915) (2019). https://doi.org/10.1101/083915. First published 2016-10-27.\n\nPart II: Alejandro Ochoa and John D. Storey. \"FST and kinship for arbitrary population structures II: Method of moments estimators\". bioRxiv (10.1101/083923) (2019). https://doi.org/10.1101/083923. First published 2016-10-27.
Byrnes, J. F.; Sherwin, W.; Goldys, B.; Murray, J.; Tanaka, M.; Bellanto, A.; Cayetano, L.
Show abstract
Many of the effects on fitness in population genetics are due not to single locations in the genome, but to the interaction of genetic variants at multiple locations in the genome. Of particular interest are completely epistatic interactions, where a combination of genetic variants is required to produce an effect, and the effect cannot occur with any other combination. In diploids, epistasis is strongly connected to meiotic recombination, a process which can both assemble and destroy beneficial combinations of genetic variants. Additionally, epistatic interactions can be hard to detect in empirical studies, and mathematical models of epistasis and recombination are challenging to analyse, so despite their ubiquity epistatic interactions are regularly not considered. As a result, there is little consensus on when high levels of recombination might be expected, or how strongly recombination affects beneficial or deleterious fitness effects controlled by epistatic interactions. We address this question by conducting a meta-analysis and simulations. The meta-analysis used data drawn and curated from Drosophila melanogaster studies in Flybase. We extracted studies relating genetic combinations and phenotypically detectable effects on fitness, then analysed the relationship between the rate of recombination and effect on fitness with a statistical model. We also ran simulations under a two-locus Wright-Fisher model with recombination and epistatic selection. The results of both approaches indicated a tendency for genetic combinations with an epistatic effect on fitness to occur in an environment of reduced meiotic recombination. Two possible explanations for this are that the variants controlling such interactions are selected for in regions where there is little recombination, or that such interactions lead to selection for lower rates of recombination in the regions where those variants appear.
Crombie, T. A.; Rajaei, M.; Saxena, A. S.; Johnson, L. M.; Saber, S.; Tanny, R. E.; Ponciano, J. M.; Andersen, E. C.; Zhou, J.; Baer, C. F.
Show abstract
The distribution of fitness effects (DFE) of new mutations plays a central role in evolutionary biology. Estimates of the DFE from experimental Mutation Accumulation (MA) lines are compromised by the complete linkage disequilibrium (LD) between mutations in different lines. To reduce LD, we constructed two sets of recombinant inbred lines from a cross of two C. elegans MA lines. One set of lines ("RIAILs") was intercrossed for ten generations prior to ten generations of selfing; the second set of lines ("RILs") omitted the intercrossing. Residual LD in the RIAILs is much less than in the RILs, which affects the inferred DFE when the sets of lines are analyzed separately. The best-fit model estimated from all lines (RIAILs + RILs) infers a large fraction of mutations with positive effects ([~]40%); models that constrain mutations to have negative effects fit much worse. The conclusion is the same using only the RILs. For the RIAILs, however, models that constrain mutations to have negative effects fit nearly as well as models that allow positive effects. When mutations in high LD are pooled into haplotypes, the inferred DFE becomes increasingly negative-skewed and leptokurtic. We conclude that the conventional wisdom - most mutations have effects near zero, a handful of mutations have effects that are substantially negative and mutations with positive effects are very rare - is likely correct, and that unless it can be shown otherwise, estimates of the DFE that infer a substantial fraction of mutations with positive effects are likely confounded by LD.
Houle, D.; Bolstad, G. H.; Hansen, T. F.
Show abstract
If there is abundant mutational and standing genetic variation, most expect that the rate of evolution would be driven primarily by natural selection, and potentially be independent of current variability or variation. Contrary to this expectation, we (H17: Houle et al. 2017 Nature 548:447) found surprisingly strong scaling relationships with slopes near one between mutational variance, standing genetic variance and macro-evolutionary rate in Drosophilid wing traits. Jiang and Zhang (J&Z20: 2020 Evolution https://doi.org/10.1111/evo.14076) have challenged these results and our interpretation of them. J&Z20 showed that the method used in H17 to estimate the scaling relationship between variation at different biological levels is uninformative. Using an alternative method, they estimated that the scaling relationship has a slope substantially less than one, and propose a variant of our neutral subset hypothesis to explain this. Here we use simulations to confirm J&Z20s finding that the H17 method for estimating scaling of variances is uninformative. The simulations also show their alternative method for estimating scaling is likely to be seriously biased towards lower scaling relationships. We propose and verify an alternative approach to calculating scaling relationships based on independently estimated variance matrices, which we call the Q method. Simulations and reanalyses of the Drosophilid data set using the Q method suggests that our original estimates of the scaling relationship were close to the true value. We propose an analytical version of the neutral subset model, and show that it can indeed explain any scaling slope by varying assumptions about the pattern of pleiotropy. We continue to regard neutral subset models as implausible for wing shape in Drosophilids due to the likelihood that wing shape is subject to direct selection. Hybrid models in which pleiotropy reduces the available genetic and mutational variation, and a combination of selection and drift controls the change in species means seem more biologically promising.
Li, X. C.; Fuqua, T.; van Breugel, M. E.; Crocker, J.
Show abstract
Rapid enhancer and slow promoter evolution have been demonstrated through comparative genomics. However, it is not clear how this information is encoded genetically and if this can be used to place evolution in a predictive context. Part of the challenge is that our understanding of the potential for regulatory evolution is biased primarily toward natural variation or limited experimental perturbations. Here, to explore the evolutionary capacity of promoter variation, we surveyed an unbiased mutation library for three promoters in Drosophila melanogaster. We found that mutations in promoters had limited to no effect on spatial patterns of gene expression. Compared to developmental enhancers, promoters are more robust to mutations and have more access to mutations that can increase gene expression, suggesting that their low activity might be a result of selection. Consistent with these observations, increasing the promoter activity at the endogenous locus of shavenbaby led to increased transcription yet limited phenotypic changes. Taken together, developmental promoters may encode robust transcriptional outputs allowing evolvability through the integration of diverse developmental enhancers. Quote"Regulators, mount up [at transcriptional promoters]." - Warren G & Nate Dogg, 1994
Xue, A. T.; Huang, Y.-f.; Siepel, A.
Show abstract
There has been rising interest in exploiting data from genome-wide association studies (GWAS) to detect a genetic signature of natural selection acting on a given phenotype. However, current approaches are unable to directly estimate the distribution of fitness effects (DFE), an established property in population genetics that can elucidate genomic architecture pertaining to a particular focal trait. To this end, we introduce ASSESS, an inferential method that exploits the Poisson Random Field (PRF) to model selection coefficients from genome-wide allele count data, while jointly conditioning GWAS summary statistics on a latent distribution of phenotypic effect sizes. This probabilistic model is unified under the assumption of an explicit relationship between fitness and trait effect to yield a DFE. To gauge the performance of ASSESS, we enlisted various simulation experiments that covered a range of usage cases and model misspecifications, which revealed accurate recovery of the underlying selection signal. As a further proof-of-concept, ASSESS was applied to an array of publicly available human trait data, whereby we replicated previously published empirical findings from an alternative methodology. These demonstrations illustrate the potential of ASSESS to satisfy an increasing need for powerful yet convenient population genomic inference from GWAS summary statistics. Author SummaryThe growth of genome-wide association studies (GWAS) over the past decade has provided a wealth of resources for uncovering the genomic architecture underlying complex traits, including the footprint of selection. Currently, there are computational tools for inferring natural selection whereby GWAS results are leveraged to conduct a binary test for overall presence, estimate a correlated property, or summarize polygenic selection strength with a single statistic. However, a methodology that exploits GWAS data to estimate the distribution of fitness effects (DFE), which is the most direct measurement for the genetic impact of natural selection acting on a complex trait, does not currently exist. To this end, we constructed an approach to directly infer the DFE, wherein per-site selection coefficients specifically associated with a focal trait are aggregated across the genome. This implementation is designed to explicitly model an entire genome-wide set of summary statistics output from a GWAS rather than the individual-level input data, which offers computational efficiency and convenience as well as alleviates privacy concerns. We expect this to be a promising development given the further accumulation of GWAS results and investigators seeking more sophisticated analyses into the relationship between genetics and traits.
Lister-Shimauchi, E.; Dinh, M.; Maddox, P.; Ahmed, S.
Show abstract
Transgenerational Epigenetic Inheritance occurs when gametes transmit forms of information without altering genomic DNA1. Although deficiency for telomerase in human families causes transgenerational shortening of telomeres2, a role for telomeres in Transgenerational Epigenetic Inheritance is unknown. Here we show that Protection Of Telomeres 1 (Pot1) proteins, which interact with single-stranded telomeric DNA3,4, function in gametes to regulate developmental expression of telomeric foci for multiple generations. C. elegans POT-1 and POT-25,6 formed abundant telomeric foci in adult germ cells that vanished in 1-cell embryos and gradually accumulated during development. pot-2 mutants displayed abundant POT-1::mCherry foci throughout development. pot-2 mutant gametes created F1 cross-progeny with constitutively abundant POT-1::mCherry and mNeonGreen::POT-2 foci, which persisted for 6 generations but did not alter telomere length. pot-1 mutant and pot-2; pot-1 double mutant gametes gave rise to progeny with constitutively diminished Pot1 foci. Genomic silencing and small RNAs potentiate many transgenerational effects7 but did not affect Pot1 foci. We conclude that C. elegans POT-1 functions at telomeres of pot-2 mutant gametes to create constitutively high levels of Pot1 foci in future generations. As regulation of telomeres and Pot1 have been tied to cancer8,9, this novel and highly persistent form of Transgenerational Epigenetic Inheritance could be relevant to human health.
Schmoller, K. M.; Lanz, M. C.; Kim, J.; Koivomagi, M.; Qu, Y.; Tang, C.; Kukhtevich, I. V.; Schneider, R.; Rudolf, F.; Moreno, D. F.; Aldea, M.; Lucena, R.; Kellogg, D.; Skotheim, J. M.
Show abstract
In their manuscript, Litsios et al.1 report a new model for how cell growth and biosynthetic activity control the G1/S transition in budding yeast. In essence, Litsios et al. claim that Start is driven by an increasing concentration of the G1 cyclin Cln3 due to a dramatic acceleration of protein synthesis in pre-Start G1 and not by the dilution of the cell cycle inhibitor Whi5. While we previously reported that Start was in part driven by cell growth during G1 diluting out the Start inhibitor Whi52, Litsios et al. report that Whi5 remains at constant concentration during G1, and changes in Whi5 concentration therefore do not contribute to Start. Since Litsios et al. directly contradict several key points of our own model of how cell growth triggers Start, we decided to investigate their claims and data. More specifically, we decided to investigate Litsios et al.s three major claims: O_LIWhi5 concentration remains constant during G1 C_LIO_LICln3 concentration strongly increases prior to Start C_LIO_LIGlobal protein synthesis rates increase by 2-3 fold prior to Start C_LI We investigated each of these three claims and found that the evidence presented by Litsios et al. does not support their claims due to inadequate analysis methods and flaws in their experiments.
Feldmann, M. J.; Piepho, H.-P.; Knapp, S. J.
Show abstract
Many important traits in plants, animals, and microbes are polygenic and are therefore difficult to improve through traditional marker-assisted selection. Genomic prediction addresses this by enabling the inclusion of all genetic data in a mixed model framework. The main method for predicting breeding values is genomic best linear unbiased prediction (GBLUP), which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. The use of relationship matrices allows information to be shared for estimating the genetic values for observed entries and predicting genetic values for unobserved entries. One of the key parameters of such models is genomic heritability [Formula], or the variance of a trait associated with a genome-wide sample of DNA polymorphisms. Here we discuss the relationship between several common methods for calculating the genomic relationship matrix and propose a new matrix based on the average semivariance that yields accurate estimates of genomic variance in the observed population regardless of the focal population quality as well as accurate breeding value predictions in unobserved samples. Notably, our proposed method is highly similar to the approach presented by Legarra (2016) despite different mathematical derivations and statistical perspectives and only deviates from the classic approach presented in VanRaden (2008) by a scaling factor. With current approaches, we found that the genomic heritability tends to be either over- or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population and that, unlike its predecessors, our newly proposed kinship matrix KASV yields accurate estimates of [Formula] in the observed population, generalizes to larger populations, and produces BLUPs equivalent to common methods in plants and animals.
Franco, M.; Fleischmann, Z.; Annis, S.; Cote-L'Heureux, A.; Aidlen, D.; Khrapko, M.; Vyshedskiy, B.; Mirzoyan, D.; Bandell, J.; Popadin, K.; Woods, D. C.; Tilly, J. L.; Khrapko, K.
Show abstract
Purifying selection of mtDNA mutations is a vital process that cleanses the mitochondrial genome of detrimental variants that may endanger individuals and populations. A common measure of purifying selection is the increase of average synonymity by reduction the proportion of mostly detrimental non-synonymous mutations. The mechanisms underlying purifying selection are still debated. The Makova group has recently published high-fidelity analysis of mtDNA mutations in individual human oocytes (Arbeithuber et al., 2025). The authors observed a decrease in the proportion of potentially detrimental coding and conservative mutations at higher mutant fractions (MFs) and interpreted this as purifying selection removing detrimental mutations at higher MFs. We noted, however, that, in contrast to what would be expected under purifying selection, the synonymity of oocyte mutations was very low and decreased, rather than increased, at higher MFs. We hypothesized that this inconsistency resulted from non-synonymous mutations being prone to strong positive selection which erroneously made coding mutations appear negatively selected in comparison. In support of our hypothesis, we show that non-coding oocytes mutations indeed are under strong positive selection. To alleviate this setback, we reanalyzed the data using a new metric of intracellular clonal selection and neutral synonymous mutations as the reference. We demonstrated that coding mutations are in fact under prevailing positive selection. This is in line with previous estimates of positive selection in primordial germ cells (PGCs) and in mother-child pairs. Importantly, "prevailing positive selection" does not imply the absence of negative selection. We show that specific types of mutations may be under prevailing purifying selection (e.g., the Co1 gene). Of note, this prevailing positive selection pertains only to the most recent, germline mtDNA mutations which have not been yet inherited into the next generation. Purifying selection steps in as germline mutations proceed to subsequent generations. The implications of these findings and the potential benefits of positive selection of detrimental mtDNA mutations are discussed. Graphical summaryGermline mtDNA mutations fuel evolution, shape population genomics, and cause mitochondrial disease. Yet it remains unresolved whether mtDNA selection in the germline is predominantly purifying or positive. A recent high-fidelity single-oocyte study (Arbeithuber et al., 2025) reported purifying(negative) selection on germline mutations at elevated mutant fractions (MFs). However, the low synonymity of oocyte mutations and concerns about using non-coding mutations as a reference for estimating selection prompted us to reanalyze the data. In single cells, mtDNA mutations are subject to genetic drift which randomly expands mtDNA clones. In this context, selection means that some mutant clones expand systematically faster (positive selection), or slower/get lost (purifying selection) than expected by random drift. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=179 SRC="FIGDIR/small/697248v1_ufig1.gif" ALT="Figure 1"> View larger version (65K): org.highwire.dtl.DTLVardef@96fc4eorg.highwire.dtl.DTLVardef@8e717eorg.highwire.dtl.DTLVardef@1bd38b8org.highwire.dtl.DTLVardef@1d6cc69_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure A.C_FLOATNO Cumulative proportion curves of relevant types of mutations (color coded). The corresponding selection metrics,[S] 0.01, and Monte-Carlo p-values are shown, analyses for other thresholds: tables 2 and S1 C_FIG O_TBL View this table: org.highwire.dtl.DTLVardef@1063ef6org.highwire.dtl.DTLVardef@10fa4b2org.highwire.dtl.DTLVardef@5e13corg.highwire.dtl.DTLVardef@64afaorg.highwire.dtl.DTLVardef@12484f3_HPS_FORMAT_FIGEXP M_TBL O_FLOATNOTable S1.C_FLOATNO C_TBL In Figure 1, datapoints represent clones of mutations ranked by their size and plotted vs. their cumulative contribution to the mutational pool (in reverse order). The resulting cumulative curves represent the collective expansion of clones of mutations of each type. As previously shown by direct simulations (Franco et al., 2025), the slope of the curve qualitatively depicts the relative intensity of clonal expansion. The green curve consists of synonymous (neutral) mutations and thus defines the trajectory of expansion driven by neutral genetic drift. Curves that diverge upward from the neutral green curve imply faster-than-neutral expansion, i.e., positive selection in corresponding mutation types, and those that diverge downward (grey Co1 curve) imply negative selection. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=169 SRC="FIGDIR/small/697248v1_fig1.gif" ALT="Figure 1"> View larger version (49K): org.highwire.dtl.DTLVardef@9483a5org.highwire.dtl.DTLVardef@4efc4dorg.highwire.dtl.DTLVardef@1964229org.highwire.dtl.DTLVardef@1d1c5d9_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 1.C_FLOATNO Cumulative proportion curves of relevant types of mutations (color coded). C_FIG To estimate selection, we first defined extent of clonal expansion,[E] t (mutant class), a measure of overall clonal expansion, i.e., the proportion of aggregate fraction of mutants of a particular class in clones that exceeded a specific size (i.e., MF). For example,[E] 0.01(coding) is the ratio of aggregate mutant fraction of all large clones (MF>0.01) of coding mutations, divided by the aggregate mutant fraction of all coding mutations.[E] t changes with t, but at each t, it permits us to compare the extent of expansion between different classes of mutations (e.g., coding vs. non-coding). Selection at the intracellular level manifests as an acceleration or deceleration of the expansion (or loss) of mutant clones relative to the expansion expected under random drift, as represented by synonymous mutations. Accordingly, \a measure of clonal selection for a tested mutation class, denoted[S] t(tested), is defined as the excess clonal expansion in the tested mutations over expansion of synonymous mutations, normalized by the expansion of synonymous mutations: O_FD O_INLINEFIG[Formula 1]C_INLINEFIGM_FD(1)C_FD In Fig.1, [S]0.01 and p-values demonstrate: O_LIStrong positive selection of noncoding mutations (blue). C_LIO_LIPositive selection of coding mutations (orange). C_LIO_LIA higher positive selection in conservative (i.e., more detrimental) coding mutations (red). So, selection may be driven by the detrimental effects of mtDNA mutations. C_LIO_LINegative selection in the Co1 gene (grey). Thus, negative/purifying selection does exist, but dominates only in specific small regions (like Co1). C_LIO_LISelection in non-coding mutations starts at low mutant fractions, coding at higher mutant fractions. C_LI This confirms the prevalence of positive selection and clarifies why Arbeithuber et al. perceived selection as purifying. The authors compared coding to non-coding mutations. The latter are under stronger positive selection than coding mutations. Thus, coding mutations appear to be under relative negative selection, but only in comparison to non-synonymous mutations, not in absolute, real terms. Note that positive selection does not necessarily proceed in oocytes. Some of the mutations present in oocytes originate in primordial germ cells (PGCs), where they may also have been under selection before being passed on to oocytes. Indeed, positive selection in PGCs has been demonstrated previously (Fleischmann et al., 2024). Finally, positive selection of detrimental mutations in oocytes may seem surprising from an evolutionary perspective. A possible explanation is that the detrimental effects of mtDNA mutations usually do not show up till MF surpasses a physiological threshold. Thus, positive expansion may expose the detrimental phenotype of mutations and assist in removing carrier cells, embryos, or individuals, thus reducing burden on the mother. In line with this, purifying selection become prevalent among inherited mtDNA mutations.