Back

G3

Oxford University Press (OUP)

Preprints posted in the last 30 days, ranked by how well they match G3's content profile, based on 33 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
The impact of long-read sequencing on fungal genome assemblies: progress and disparity

Kroll, E.; Zoclanclounon, Y. A. B.; Urban, M.; Hill, R.; Hammond-Kosack, K. E.

2026-05-14 genomics 10.64898/2026.05.12.724544 medRxiv
Top 0.1%
14.2%
Show abstract

Fungal genomics has expanded rapidly over the past 30 years, and recently the pace and breath has further quickened for many taxa, although many taxonomic gaps persist. With three decades of rapid growth, fungal genomics now merits a re-examination of its history, progress, and unresolved taxonomic gaps. Here, we review the development of fungal genomics from early efforts such as the Fungal Genome Initiative to current progress driven by third-generation long-read sequencing. We have compiled and summarised publicly available fungal genomes to highlight trends in assembly quality, adoption of long-read technologies, and taxonomic representation. Notably, substantial phylogenetic gaps remain, particularly outside Dikarya, and significant challenges persist for unculturable taxa. This review identifies priorities for the fungal community, including: (1) coordinated efforts to close major taxonomic gaps across the fungal tree of life; (2) improved repository metrics to facilitate identification of high-quality assemblies; and (3) improved and standardised genome annotation which is lacking for most assemblies. Together, these steps will support the development of reliable genomic resources that capture the full breadth of diversity across the fungal kingdom, generating foundational data for comparative genomics, evolutionary biology, functional studies, genetic studies and applied research.

2
Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv
Top 0.1%
4.9%
Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

3
Mapping of Stripe Rust and Leaf Rust Resistance Genes in the Hard Red Winter Wheat Population Green Hammer/Lonerider

Sharma, R.; Wang, M.; Chen, X.; Carver, B. F.; Guttieri, M.; St. Amand, P.; Bernardo, A.; Bai, G.; Liu, S.; Ara, A. M.; Aoun, M.

2026-05-15 genetics 10.64898/2026.05.13.724876 medRxiv
Top 0.1%
2.7%
Show abstract

Stripe rust and leaf rust, caused by Puccinia striiformis f. sp. tritici and P. triticina, respectively, are the most destructive wheat diseases in the southern Great Plains. Green Hammer is a hard red winter wheat (HRWW) cultivar released by Oklahoma State University in 2018 and has demonstrated a stable adult plant resistance to stripe rust and race-specific seedling resistance to leaf rust. To identify and map rust resistance loci, 109 doubled haploid (DH) lines derived from the cross between Green Hammer and another HRWW cultivar, Lonerider, were developed. Lonerider showed adult plant resistance to stripe rust but was susceptible to multiple P. triticina races. The DH lines were evaluated for stripe rust at the adult plant stage in greenhouse and field environments across Oklahoma, Kansas, and Washington, and for leaf rust at the seedling stage against seven U.S. P. triticina races and at the adult plant stage in Oklahoma and Texas. Genotyping-by-sequencing generated 6,078 polymorphic single-nucleotide polymorphisms used for genetic mapping. Quantitative trait loci (QTL) analysis identified 14 stripe rust and 8 leaf rust resistance QTL. For stripe rust, a major QTL in Green Hammer, QYr.osughln-2AS, was identified in the proximity of the 2NvS translocation. Three other major stripe rust resistance QTL were identified in Lonerider on chromosomes 2AL (two QTL) and 2BS (one QTL). For leaf rust, QLr.osughln-1DS and QLr.osughln-2DS.1 were the two major QTL identified in Green Hammer and most likely correspond to the all-stage resistance genes Lr21 and Lr39, respectively. In this study, we identified previously characterized genes as well as unknown genes that can be utilized in wheat breeding programs to enhance resistance to leaf rust and stripe rust.

4
Temporal changes in allele frequency facilitate detection of adaptive variants in winter wheat (Triticum aestivum L.) breeding programs

Johansen, N. H.; Sarup, P.; Hansen, P.; Orabi, J.; Jahoor, A.; Ramstein, G. P.

2026-05-04 genetics 10.64898/2026.04.30.721918 medRxiv
Top 0.1%
2.1%
Show abstract

In quantitative genetics, candidate SNPs are identified through genotype-phenotype associations inferred with genome-wide association studies (GWAS). In this study, we explore an alternative approach to detect genetic variants with non-neutral effects by tracking temporal trends in allele frequency in a winter wheat (Triticum aestivum L.) breeding population over an eight-year period, from which signals of selection may be inferred. Selection signatures were inferred with a generalized linear model, where we modeled trends in allele frequency as a function of time (crossing year). These signatures of selection were used to prioritize variants. Associations between phenotypic performance and individual load of prioritized variants were then investigated. Furthermore, we assessed whether incorporating selection information into a genomic best linear unbiased prediction (GBLUP) model improves model performance in terms of quality of fit and prediction ability. Our findings indicate that the inferred signals of selection are effective in identifying non-neutral variants. Variants under strong negative selection were associated with a decrease in protein content adjusted for grain yield (p-value < 0.01), while genetic variants that had been under moderate to high levels of positive selection were associated with increased grain yield (p-value < 0.01). However, incorporating selection information did not improve prediction accuracy. In conclusion, temporal trends in allele frequency can be used to detect non-neutral variants. The proposed approach may hence complement traditional quantitative genetic methods for detecting non-neutral genetic variation. This approach may allow breeders to detect non-neutral variants earlier in the breeding cycle, without resorting to phenotypic data.

5
A weighted multi-trait approach for heterotic grouping of maize inbred lines under Striga infestation and optimum environments

Abubakar, A. M.; Adejumobi, I. I.; Mengesha, W. A.; Meseka, S.; Oyekunle, M.; Ado, S. G.; Bonkoungou, T. O.; Badu-Apraku, B. A.; Derera, J.

2026-05-16 genetics 10.64898/2026.05.15.725596 medRxiv
Top 0.1%
2.1%
Show abstract

Maximum utilization of existing genetic variability in a breeding program depends on the efficient classification of the inbred lines into heterotic groups, particularly under stress conditions. This study applied practical breeding approaches to determine the mode of genetic inheritance for Striga resistance and proposes a weighted heterotic grouping method based on the general combining ability of multiple traits (WHGCAMT) and compares its effectiveness with other existing methods in classifying the inbred lines into heterotic groups in Striga-infested and optimum environments. Using Diallel design IV, 300 crosses were generated from 21 inbred lines and 4 standard testers. The crosses, along with six checks, were evaluated in an 18 x 17 alpha lattice design with two replications at two locations, in both artificial Striga-infested and Striga-free environments. The inbred lines were genotyped using DArTtag SNP markers. Phenotypic and genotypic data were analyzed using R. Analysis of variance revealed significant mean squares for hybrid, general combining ability (GCA), specific combining ability (SCA) and their interactions with environment. Significant positive and negative GCA and SCA effects were detected for grain yield and other measured traits. However, a larger proportion of additive gene action than non-additive gene action was observed for grain yield and most measured traits. The analysis of molecular variance also showed substantial genetic differences within and between clusters. Except for HSCA, the mean grain yield between the inter-group and intra-group hybrids was significant for each method. Pairwise comparison of the inter- and intra-group hybrids of all the methods showed significant differences between the WHGCAMT and all other methods in most cases. WHGCAMT consistently produced higher-yielding inter-group hybrids and lower-yielding intra-group hybrids, achieving breeding efficiency improvements of 55.8%, 4.3%, 15.7%, and 11.4% over the HSCA, HSGCA, HGCAMT and molecular marker methods, respectively, under Striga infestation. Thus, WHGCAMT offers more precise, reliable and biologically meaningful heterotic groups among early-maturing maize inbred lines.

6
Increasing Phenomic Prediction Efficiency Using A Principal Component Analysis Based Pre-Processing Of Near Infrared Spectra

Bienvenu, C.; Roger, J.-M.; Sene, M.; Castro Pacheco, S. A.; Singer, M.; Felaniaina, B. L.; Terrier, N.; De Bellis, F.; Pot, D.; DE VERDAL, H.; Segura, V.

2026-05-13 genetics 10.64898/2026.05.10.724118 medRxiv
Top 0.2%
1.8%
Show abstract

Phenomic prediction (PP) is a breeding value prediction method using near infrared spectroscopy (NIRS). Spectra pre-processing is a key step in the analysis pipeline of PP and generally involves chemometrics methods. However, there is still little understanding in the genetics community of what pre-processing does and why it increases performances. Consequently, the choice of pre-processing is done either arbitrarily or through a search of the optimal set of methods and associated parameters. In this study, we propose a PCA-based pre-processing method where genetic values of spectra are estimated on a set of principal components instead of individual wavelengths. This way, estimations are based on a few informative and orthogonal features of spectra instead of many correlated, uninformative wavelengths. We tested this new pre-processing method on five data sets representing four plant species (maize, rice, sorghum and grapevine). Results show that it performs as good, or better than the best classical chemometric pre-processing methods in almost all cases. Combining PCA-based and classical chemometric pre-processing methods maximizes predictive ability. Moreover, this pre-processing method opens up possibilities of better understanding and selecting parts of the spectral information that are relevant for the prediction of breeding values. Indeed, components representing together about 1% of spectral variability were found to be responsible for most of PP predictive ability. Plain language summaryCultivated plants are the result of a breeding process during which their genetic values are used to select those to breed. Estimation of breeding values requires heavy experimental means and is time consuming. Phenomic prediction is a low cost and high throughput genetic value estimation method that is increasingly being used. It often uses near infrared spectroscopy measurements as predictors of genetic values that are easy to collect and thus routinely used in many species. However, near infrared spectra generally require pre-processing before being used in prediction. Currently used pre-processing methods arise from the chemometrics community, and still deserve a better in-depth appropriation by geneticists. In this study, we propose a new pre-processing approach that performs as good as or better than the best chemometric pre-processing generally used, reduces computation time, and allows for a better understanding of what parts of spectral information are relevant for prediction. Core IdeasO_LIWorking on principal components of spectra instead of wavelengths increases predictive ability of phenomic prediction and performs as good as or better than classical chemometrics pre-processing C_LIO_LIWorking on principal components of spectra requires less optimization of parameters than chemometrics pre-processing C_LIO_LIAbout 1% of spectral variance is responsible for most of the predictive power of phenomic prediction C_LIO_LIWorking on principal components of spectra pre-processed with classical chemometrics pre-processing can increase predictive ability even more C_LIO_LIPCA-based methods are valuable to optimize predictive ability of phenomic prediction and could be used more widely in the quantitative genetics field C_LI

7
A comparison of scalable approaches for the pairwise analysis of large pathogen genomic and spatial datasets: an application to studying Mycobacterium tuberculosis transmission

Lan, Y.; Wu, C.-Y.; Lin, H.-H.; Cohen, T.; Warren, J. L.

2026-05-21 microbiology 10.64898/2026.05.21.726848 medRxiv
Top 0.2%
1.7%
Show abstract

Pairwise analysis of genomic and spatial data offers opportunities to identify and estimate the associations between covariates and the transmission of pathogens between individuals. However, such pairwise analyses are computationally intensive, and may not be feasible to conduct given the high dyad count in even moderately sized datasets. Here we compare two approaches to increase the efficiency of pairwise analysis for large datasets. We quantify and compare the performance of divide-and-conquer Bayesian model fitting and pairwise case-control approaches for estimating associations between individual- and pair-level covariates and shared membership in a transmission cluster. We utilize a large dataset (n=4,154) of spatially-referenced, genomically-sequenced Mycobacterium tuberculosis isolates collected from a single city for this analysis. We find that the case-control approach produces unbiased estimates of effect sizes with expected credible interval coverage and is more robust than the divide-and-conquer method when effect sizes are large. Thus, we recommend using the case-control approach with at least three controls per case to downscale datasets for pairwise analysis when analysis of the entire dataset is not possible. This approach mitigates the computational challenges of pairwise Bayesian modeling on datasets that require significant computational resources while maintaining desired inferential properties. Author SummaryPairwise analyses of large datasets to study pathogen transmission are computationally demanding because they typically require simultaneous analysis of each possible pair of individuals in a dataset; as datasets become larger these analyses often are not feasible to conduct even with access to high-performance computing resources. In this work, we compare a case-control approach and divide-and-conquer approaches for more efficient pairwise analysis of large datasets. Using a large dataset of Mycobacterium tuberculosis isolates including genetic and spatial data, we investigate the performance of each method for estimating the associations between host covariates and genetic clustering of isolates. We find that the case-control approach is generally preferred over methods which first divide the data into subsets and then combine results. While additional extensions of these analyses are needed to test the generality of these findings to other data settings, this work provides a practical way forward for the pairwise analysis of large datasets to study pathogen transmission.

8
Identification of septoria nodorum blotch susceptibility genes in hard winter wheat

Ara, A. M.; Holmes, D. J.; Friesen, T. L.; Carver, B. F.; Bai, G.; St. Amand, P.; Bernado, A.; Sharma, R.; Aoun, M.

2026-05-15 genetics 10.64898/2026.05.13.724689 medRxiv
Top 0.2%
1.7%
Show abstract

Key message Characterized and unknown septoria nodorum blotch susceptibility/resistance genes were identified in contemporary U.S. hard winter wheat. The necrotrophic fungus Parastagonospora nodorum is the causal agent of septoria nodorum blotch (SNB) of wheat. To determine the prevalence of SNB sensitivity genes in a contemporary U.S. hard winter wheat (HWW), we evaluated a panel of 619 breeding lines and cultivars against five P. nodorum isolates and five necrotrophic effectors (NEs), SnToxA, SnTox1, SnTox3, SnTox267 and SnTox5, and genotyped the panel using genotyping-by-sequencing (GBS) markers and diagnostic Kompetetive-allele specific PCR (KASP) markers for the sensitivity genes Tsn1-B1, Snn1-B1, and Snn3-B1/B2. GBS analysis identified 34,357 GBS-single nucleotide polymorphism (SNP) markers. Evaluations against P. nodorum isolates showed that 40-67% of the genotypes were susceptible in the panel. Toxin infiltration assays showed that 54%, 2%, 37%, 13%, and 15% of the genotypes were sensitive to SnToxA, SnTox1, SnTox3, SnTox267, and SnTox5, respectively. Diagnostic KASP markers for Tsn1-B1, Snn1-B1, and Snn3-B1/B2 showed prediction accuracies of 98%, 75%, and 92% for the corresponding effectors SnToxA, SnTox1, and SnTox3, respectively. Genome-wide association studies (GWAS) not only confirmed the presence of the previously characterized sensitivity genes Tsn1-B1, Snn1-B1, Snn2, Snn3-B1/B2, and Snn5-B1, but also identified new loci to be associated with responses to P. nodorum isolates and NEs. Of which, Qsnb.osu-2AS on chromosome 2AS was associated with responses to all five isolates. We developed KASP markers KASP_S4B_643615365, KASP_ S2D_16184991, and KASP_S2A_9833162 linked to Snn5-B1, Snn2, and Qsnb.osu-2AS, respectively. These findings should guide breeding for SNB resistance in hard winter wheat.

9
Environmental impacts on gene expression noise and its relationship with fitness

Haque, T.; Siddiq, M. A.; Duveau, F. M.; Wittkopp, P.

2026-05-18 evolutionary biology 10.64898/2026.05.18.725919 medRxiv
Top 0.2%
1.7%
Show abstract

Genetically identical cells grown in the same environment show variation in gene expression known as expression noise. Expression noise can be heritable and impact fitness, making it subject to natural selection. Increasing expression noise for the Saccharomyces cerevisiae TDH3 gene was shown to be beneficial in glucose-based media when mean TDH3 expression was far from the fitness optimum but deleterious when it was close to this optimum. Here, we show that growth on different carbon sources alters the effects of new mutations on TDH3 expression noise and examine the fitness effects of changing expression noise. In galactose-based media, we observed the same relationship between expression noise and fitness seen in glucose-based media, but in glycerol- and ethanol-based media, we observed the opposite relationship or no significant relationship, respectively. Using simulations of single-cell organisms, we found that these differences were most likely explained by environment-specific relationships between gene expression and fitness. We also found that, far from the optimum, the fitness effects of noise were greatest when expression was highly heritable between mother and daughter cells. The empirical observations and simulations reported in this study show how environments influence both the production of expression noise and its impacts on fitness.

10
Characterization of genetically effective cells and EMS mutagenesis on the novel winter oil seed Pennycress (Thlaspi arvense)

Brusa, A.; Branch, C.; Sulivan, L.; Chopra, R.; Rai, K.; Rockstad, G.; Gjesvold, E. S.; Ott, M.; Jain, S.; Biel, C. C.; Marks, M. D.

2026-05-05 genomics 10.64898/2026.04.30.722012 medRxiv
Top 0.2%
1.6%
Show abstract

Pennycress (Thlaspi arvense L.) is an intermediate winter oilseed crop that has only recently been domesticated for agronomic use. Improving agronomic traits requires sources of genetic variation, and mutagenesis is frequently used to help overcome the limitations of natural populations. We investigate the impact of Ethyl methanesulfonate (EMS) on genetically effective cells (GECs) to characterize the intra-individual genetic variation of EMS mutagenesis in pennycress. We identified that pennycress contains at least 4 GECs which, when treated with EMS, create unique mutations across different branches within the same individual plant. We then propagated the M2 plants for whole genome sequencing, providing extensive characterization of the EMS mutation profile and developing a gene index as a resource for future reverse genetic screenings. Article SummaryPennycress is an emerging winter oil seed crop in the American Midwest. Domestication efforts have advanced rapidly through a combination of genetic techniques. One of the most successful methods has been the use of a mutant gene index, a large collection of pennycress seed where new genetic variation has been created through Ethyl methanesulfonate (EMS). EMS mutations are not uniform however, and a single treated seed can have wide genetic variation within the resulting plant. We investigate the role of genetically effective cells on EMS variation, and present the full EMS population as a resource for further pennycress domestication efforts.

11
C. albicans ergosterol modulates the antifungal response of human neutrophils by masking β-glucan

Jiang, H.; Nobbs, A.; Leaves, I.; Gow, N. A. R.; Diezmann, S.; Amulic, B.

2026-05-18 microbiology 10.64898/2026.05.18.721578 medRxiv
Top 0.2%
1.6%
Show abstract

IntroductionErgosterol-targeting azoles are widely used in the treatment of Candida albicans infection. In addition to direct antifungal activity, azoles are known to enhance neutrophil-mediated killing of C. albicans, but the underlying mechanisms remain unclear, particularly whether ergosterol depletion directly modulates host immune responses. Gap StatementIt remains unknown whether reduced ergosterol levels alone, independent of broader disruption to sterol biosynthesis and fungal morphogenesis, influence neutrophil antifungal activity. AimThis study aimed to determine how genetic disruption of late-stage ergosterol biosynthesis affects neutrophil-mediated responses to C. albicans. MethodologyDoxycycline-repressible GRACE mutants targeting late-stage ergosterol biosynthesis genes (ERG4, ERG5, ERG3 and ERG28) were co-incubated with primary human neutrophils. Fungal survival, oxidative burst, phagocytosis, neutrophil extracellular trap (NET) formation and cell wall composition were assessed. ResultsAll ergosterol-deficient strains induced elevated neutrophil reactive oxygen species (ROS) production; however, only ERG4 depletion was associated with enhanced fungal clearance. This phenotype correlated with increased phagocytosis and reduced NET formation. Cell wall analysis revealed no changes in total chitin or mannan content but demonstrated significantly increased surface exposure of {beta}-1,3-glucan in ERG4-depleted cells. ConclusionThese findings indicate that disruption of late-stage ergosterol biosynthesis, particularly via ERG4, enhances neutrophil antifungal responses and is associated with increased {beta}-glucan exposure. This study highlights a potential role for ergosterol in immune evasion and suggests that targeting terminal steps of the pathway may improve host-mediated clearance of C. albicans.

12
Organelle scaling over a 100-fold cell size range

Wirshing, A. C. E.; Lew, D. J.

2026-05-13 cell biology 10.64898/2026.05.13.724986 medRxiv
Top 0.2%
1.5%
Show abstract

Cell size in a proliferating cell population generally varies over a limited range ([~]2-4-fold). Within such populations, organelle content increases with cell size maintaining a relatively constant organelle density (amount per cell volume). However, cells of different types can differ greatly in cell size as well as in organelle composition. In such cases, it is often unclear to what degree, if any, the differences in organelle composition are due to the difference in cell size. In principle, this issue could be resolved by examining situations where a proliferating population of cells of the same cell type exhibit much greater size variation. Here we characterize how organelle content scales with cell volume in the polymorphic fungus, A. pullulans, whose proliferating cells span a [~]100-fold size range. We find that mitochondria and ER content increases in proportion to cell volume, while this is not the case for vacuoles and peroxisomes. Thus, organelle composition is affected by cell size in this system.

13
Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion

Marone, M. P.; Chen, E.; Himmelbach, A.; Haberer, G.; Spannagl, M.; Stein, N.; Mascher, M.

2026-05-18 genomics 10.64898/2026.05.13.724976 medRxiv
Top 0.3%
1.3%
Show abstract

BackgroundAs pangenomes approach saturation, identifying additional genomes that contribute novel sequence information becomes increasingly difficult. Current sample-selection strategies often rely on global diversity metrics or variant counts and do not explicitly account for the composition of an existing pangenome, a limitation that becomes increasingly relevant as pangenomes mature. Here, we present SelHap, a haplotype-based pipeline that uses whole-genome sequencing (WGS) data to prioritize accessions based on their contribution of novel haplotypes relative to a defined background, enabling targeted and iterative pangenome expansion. ResultsWe applied SelHap to the barley pangenome, using 76 assembled genomes as a background to select new accessions from a large WGS panel. Using this approach, we generated chromosome-scale genome assemblies from 19 accessions selected with SelHap and from 17 elite lines selected based on their relevance in historical barley breeding. Across multiple benchmarking scenarios, SelHap-based selection consistently resulted in a greater increase in non-redundant (single-copy) pangenome sequence, demonstrating that prioritizing haplotype novelty relative to an existing background maximizes unrepresented sequence content. ConclusionsBy transforming complex haplotype-clustering outputs into interpretable summaries and ranked candidate lists, SelHap provides a practical framework for targeted pangenome expansion. Beyond sample selection, SelHap can facilitate ancestry and germplasm comparisons across diverse panels. As WGS data become more accessible, SelHap offers a scalable and interpretable solution for extending mature pangenomes by explicitly targeting previously unrepresented sequence space.

14
The impact of Cronartium ribicola inoculum density on quantitative disease resistance in whitebark pine.

Johnson, J. S.; Wilhite, B.; Kegley, A.; Danchok, R.; Sniezko, R. A.

2026-05-06 genetics 10.64898/2026.05.02.722345 medRxiv
Top 0.3%
1.0%
Show abstract

Whitebark pine (Pinus albicaulis), a wide-ranging high-elevation conifer in western North America, is listed as threatened in the U.S. and as endangered in Canada. A major threat to whitebark pine is the non-native, invasive white pine blister rust disease, caused by the fungal pathogen Cronartium ribicola. In many pathosystems (including white pine blister rust), seedling inoculation trials are used to identify parent trees with genetic resistance. However, many of these trials use only one spore density for inoculation, and little information exists on the effectiveness of quantitative disease resistance (QDR) under varying spore densities and the corresponding implications for field performance. In this study, we examine the levels of infection and survival present within six whitebark pine seedling families previously rated for QDR (three susceptible and three resistant families) under six widely varying inoculum densities. The susceptible families showed very high infection and mortality at all inoculum densities, while performance of the resistant families varied with spore density treatment. The information gathered from the study will be useful in updating the projections of the future of whitebark pine populations under field conditions in areas of different rust hazard. The results also serve as a caution to those working in other pathosystems where seedling inoculation trials based on one spore density level are used to rate the resistance level of parent trees and their associated progeny.

15
On the evolution, function and cellular fate of Neurospora crassa ACW-1 and NCW-3, proteins with different cell wall interaction mechanism

Ramirez-Pelayo, A. S.; Callejas-Negrete, O. A.; Amaya-Delgado, L.; Verdin, J.

2026-05-10 microbiology 10.64898/2026.05.09.718313 medRxiv
Top 0.4%
0.9%
Show abstract

The fungal cell wall is populated by proteins (CWPs), mostly uncharacterized, that show an atypical evolutionary behavior. Most CWPs are glycosylphosphatidylinositol(GPI)-proteins, followed by proteins with internal repeats (PIR), and non-covalently attached proteins that harbor carbohydrate binding domains (CBM). Several structural CWPs are initially bound to the same wall carbohydrates, but either covalently or non-covalently. However, it is not clear whether they work in the same way and if they are subjected to the same evolutionary constraints. In Neurospora crassa, CWPs ACW-1 (NCU08936) and NCW-3 (NCU07817) bind to {beta}-1,3-glucans through a GPI anchor or a predicted CBM-52 domain, respectively. Here, the evolutionary trajectories and functional roles of both CWPs were analyzed. Both proteins localized primarily to distal septa and hyphal wall surfaces. Morphological characterization and stress cell wall assays suggested that both proteins contribute to cell wall integrity, but NCW-3 likely plays a more prominent role. ACW-1 and NCW-3 homologues were predominantly identified in Ascomycota. ACW-1 displayed a broader distribution than NCW-3, whose homologues were largely restricted to Sordariales. Despite these differences, both protein families exhibited similar moderate global conservation and signatures of purifying selection within shared taxa. Nevertheless, a divergence gradient was identified within ACW-1, related to its tandem leucine-rich repeat (LRR) regions. A similar local accumulation of evolutionary change was not observed within NCW-3. These findings suggested that distinct CWP architectures can accommodate different patterns of sequence diversification despite sharing similar global evolutionary change.

16
Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits

Li, Z.; Li, X.; Liu, S.; Wilson, I.; Zhu, Q.-H.; Stiller, W.; Conaty, W.

2026-05-06 plant biology 10.64898/2026.05.03.722538 medRxiv
Top 0.4%
0.9%
Show abstract

Genomic prediction (GP) across diverse environments has a potential to accelerate genetic gain in cotton breeding programs. A major challenge in GP is modelling genotype-by-environment interactions (GEI), which is essential for selecting stable and high-performing genotypes under variable production conditions. However, incorporating GEI into GP models increases the dimensionality and computational complexity, risking complex models that are impractical to use on commercial breeding-scale data sets because of run times and computational demands. This study addresses two primary aims. Firstly, we evaluate the practical benefits of GEI-informed GP for predicting economically important cotton traits. Second, advanced statistical modelling strategies are developed and assessed for integrating genomic and environmental data at scale. We propose a dimensionality reduction approach that combines linkage disequilibrium network analysis with principal component techniques to reduce redundancy while preserving informative variation. Using this reduced dataset, we implement Bayesian linear regression models and, for comparison, deep residual neural networks for genomic prediction. Analyses were conducted on a large multi-environment dataset from the CSIRO cotton breeding program, comprising 3,236 breeding lines, 54 environmental covariates, and 8,049 yield and fibre quality phenotype records collected over 10 years and 9 locations representing 41 year-location combinations. Results demonstrate that generally Bayesian linear regression approaches outperform BG-BLUP models, with all three linear/linear mixed methods providing clearly more reliable performance than the deep learning models. These findings highlight the value of using interpretable statistical models for integrating genomic and environmental information to support selection decisions under diverse environmental conditions.

17
The stability of fatty acid composition in sunflower oil is dependent on environment and affected by structural variation

Ingold, M.; Gao, Q.; Mandel, J. R.; McNellie, J. P.; Keepers, K. G.; Barb, J. G.; Burke, J. M.; Rieseberg, L. H.; Hulke, B. S.

2026-05-07 plant biology 10.64898/2026.05.04.722759 medRxiv
Top 0.4%
0.8%
Show abstract

In sunflower (Helianthus annuus L.), the composition of fatty acids in the seeds, primarily oleic, linoleic, stearic and palmitic acid, is of utmost importance for oil quality. Despite this, the genetic basis of this trait and its interaction with the environment is poorly understood. Understanding this interaction is critical to improvement of sunflower within the context of climate change. In this work, we incorporated fatty acid composition measurements from the sunflower SAM population and eight environments across an extensive geographic cline into GWAS. The SAM panel consists of 287 varieties representing approximately 90% of sunflower diversity, for which 2.2 million high-quality SNPs with a MAF > 5% are available. For increased power, multivariate GWAS was performed with four different inputs: (i) mean fatty acid composition within each environment, (ii) mean fatty acid composition within each environment omitting high oleic varieties, (iii) trait stability within environments quantified by standard errors among replicate samples ( stability) and (iv) Eberhart and Russells {beta} which quantifies trait stabilities across environments ({beta} stability). All four analyses yielded highly significantly associated SNPs. We found that high oleic varieties exhibited high {beta} trait stability, resulting in substantial overlap in markers between analyses (i) and (iv), with signals being fairly consistent between environments in analysis (i). For analyses (ii) and (iii), significant markers tended to vary between trials. For significant SNPs across all analyses, 147 candidate genes were identified, including promising candidates such as 15 fatty acid metabolism genes, 6 heat shock proteins and 22 transcription factors. Lastly, a large introgression consisting of two flanking inverted sequences on Chromosome 5 was found to coincide with stability in the Georgia trial, suggesting a role in FA composition stability under high heat conditions.

18
A novel matrix multiplication framework for modeling genotype-by-environment interaction in genomic prediction

Montesinos-Lopez, O. A.; Montesinos-Lopez, A.; Montesinos-Lopez, J. C.; Crossa, J.; Dreisigacker, S.; Hernandez-Suarez, C. M.; Ortiz, R.

2026-05-15 genetics 10.64898/2026.05.11.724414 medRxiv
Top 0.5%
0.8%
Show abstract

Accurate modeling of genotype-by-environment (GxE) interaction is critical for genomic prediction in plant breeding but remains challenging due to complex interaction structures. Conventional models often use the Hadamard product of genotype and environment covariance matrices to capture joint similarity, which may not fully represent GxE complexity. Here we propose a novel framework that derives covariance structures from the matrix multiplication of genotype and environment kernels, decomposing these into symmetric components incorporated as random effects in mixed models. Evaluated for 11 wheat and rice multi-environment datasets and across, this approach consistently outperformed the traditional Hadamard-based model, improving prediction accuracy by up to 13.2% in Pearsons correlation and enhancing top-selection accuracy. Combining both methods yielded the highest performance, indicating complementary information capture. This framework offers a flexible, interpretable, and computationally feasible extension for modeling GxE interaction, potentially enhancing genomic selection effectiveness under diverse environmental conditions.

19
Resolving the oak tree of life: comparing RADseq and whole genome resequencing methods for oak phylogenetics

Hipp, A. L.; Althaus, K. N.; Fuller, E. L.; Hahn, M.; Larson, D. A.; Mohn, R. A.; Wang, B.; Manos, P. S.

2026-05-17 evolutionary biology 10.64898/2026.05.14.725274 medRxiv
Top 0.5%
0.7%
Show abstract

Forest trees pose numerous potential challenges to phylogenomic inference. Their large effective population sizes and relatively long generation times lead to deep allele coalescence and consequently incomplete lineage sorting (ILS), which biases inferences of divergence times toward older ages and introduces gene tree discordance. Deep phylogenetic divergences, reaching back into the Paleocene, introduce reference-mapping biases. Introgression--the movement of genes between lineages--may result in different phylogenies being inferred depending on which individuals are included in analysis, even if the plurality of the genome favors the divergence history unaffected by introgression. These factors influence phylogenetic inference across the Tree of Life but are particularly prevalent in forest trees. Oaks (Quercus) are notable for all three influences. In addition, our knowledge of the oak phylogeny is currently based strongly on restriction site associated DNA sequencing (RADseq) datasets published over the past decade, which may introduce additional sources of uncertainty. In this chapter, we analyze a 322-species RADseq dataset and genome resequencing data from across the genus to address sources of uncertainty in our understanding of the global oak phylogeny, which we hope will serve as a model for other research groups working on comparable woody plant groups.

20
Efficient Optimization of Genotype Pairs for Intercropping using Genomic Prediction and Bayesian Optimization

Kinoshita, S.; Iwata, H.

2026-05-18 genomics 10.64898/2026.05.15.725387 medRxiv
Top 0.6%
0.7%
Show abstract

Intercropping is a promising strategy to improve productivity and sustainability in agricultural systems, but designing effective genotype combinations remains a major challenge owing to the rapid increase in possible pairings as the number of candidate genotypes increases. This creates a practical bottleneck because field evaluation of all combinations is infeasible under realistic resource constraints. Here, we propose a framework that integrates genomic prediction and Bayesian optimization to support efficient decision-making for intercropping system design. Using genome-wide marker data from sorghum and soybean, we simulated intercropping performance across 5,214 genotype pairs under certain genetic architectures, including variation in heritability, correlations between direct and indirect genetic effects, and the contribution of pair-specific interactions. Genomic prediction models incorporating direct and indirect genetic effects substantially improved prediction accuracy compared with models based on direct genetic effects alone, and inclusion of specific mixing ability further enhanced the performance under high-heritability conditions. When coupled with Bayesian optimization, the models rapidly identified superior genotype pairs, requiring fewer evaluation cycles than random or prediction-only search strategies. Acquisition functions that account for predicted uncertainty were most effective in complex scenarios involving interaction effects or negative correlations between direct and indirect effects. These results demonstrate that combining genomic prediction with Bayesian optimization can substantially reduce the experimental burden associated with intercropping design, while improving the efficiency of identifying high-performing genotype pairs. The proposed framework provides a practical approach for prioritizing candidate mixtures in breeding and field evaluation, and contributes to the development of data-driven strategies for sustainable agricultural systems. HighlightsO_LIA data-driven framework was developed to optimize genotype pairs in intercropping. C_LIO_LIModeling indirect effects improved prediction accuracy across genotype pairs. C_LIO_LIPair-specific interactions enhanced prediction under high-heritability conditions. C_LIO_LIBayesian optimization identified superior pairs under limited evaluation capacity. C_LIO_LIThe framework reduces field-testing requirements for intercropping system design. C_LI