GENETICS — Latest Matching Preprints

1

Domain-specific mutations in unc-6/Netrin differentially affect dorsal-ventral axon pathfinding in Caenorhabditis elegans

Hooper, K. M.; Clark, S. G.; Lundquist, E. A.

2026-07-15 developmental biology 10.64898/2026.07.14.738297 medRxiv

Top 0.4%

12.6%

Show abstract

UNC-6/Netrin is a conserved regulator of dorsal-ventral axon and cell migrations. UNC-6 is composed of a Laminin N-terminal domain (LN), three epidermal growth factor repeats (EGF), and a Netrin C terminal domain (NC). Here, we identified missense mutations in distinct UNC-6 domains and assessed their roles in dorsal VD/DD motor axon guidance and ventral AVM axon guidance. A missense mutation in a conserved residue of the LN domain (G289D) resulted in dorsal and ventral axon guidance defects similar to unc-6 null. A distinct missense mutation in the LN domain (S120F) was hypomorphic and strongly perturbed ventral AVM axon guidance with minimal effects on dorsal VD/DD axon guidance, showing that S120F is predominantly required for ventral guidance. Missense mutations altering conserved cysteine residues involved in di-sulfide bonding in the EGF domains were analyzed. EGF1(C321G) caused both ventral and dorsal axon guidance defects albeit weaker than unc-6 null, indicating that EGF1 is required for both. EGF2(C347Y) strongly affected dorsal VD/DD axon guidance similar to unc-6 null, with weaker perturbation of ventral AVM axon guidance. Previous results revealed that EGF3(C410Y) specifically disrupted dorsal axon guidance, a result that we confirmed. Our studies using missense mutations in the endogenous unc-6 locus complement previous structure-function studies using transgenic expression, and identify domains specifically required for ventral AVM guidance (S120Y in the LN domain) and dorsal VD/DD axon guidance (C410Y in EGF3). The crystal structure of UNC-6 indicates conserved N-linked glycosylation at N114 and N128. Mutation of these sites in UNC-6 had no effect on dorsal ventral axon guidance, showing that they do not play a major role. However, the N114 and N128 mutations interacted genetically with unc-40 and unc-5 mutations, indicating that these glycosylation sites indeed have a role in UNC-6 signaling. Our results will inform studies on how these distinct UNC-6 domains interact with guidance receptors (e.g. UNC-40/DCC and UNC-5) and other extracellular molecules to mediate dorsal-ventral axon guidance.

2

Multi-tissue analyses of allele-specific chromatin accessibility nominate likely functional variants for type 2 diabetes

Narisu, N.; Li, H. X.; Rathbun, C. J. M.; Varshney, A.; Swift, A. J.; Yan, T.; Sinha, N.; Currin, K. W.; Xue, D.; Robertson, C. C.; Taylor, D. L.; Taylor, H. J.; Beck, A.; Lee, B. N.; Wang, L.; Broadaway, K. A.; Wilson, E. P.; Stringham, H.; Saramies, J.; Lakka, T. A.; Spracklen, C. N.; Scott, L. J.; Stitzel, M. L.; Tuomilehto, J.; Laakso, M.; Koistinen, H. A.; Boehnke, M.; Arda, H. E.; Chen, S.; Biesecker, L. G.; Bonnycastle, L. L.; Erdos, M. R.; Mohlke, K. L.; Parker, S. C. J.; Collins, F. S.

2026-07-15 health informatics 10.64898/2026.07.14.26358094 medRxiv

Top 0.5%

10.1%

Show abstract

Genome-wide association studies (GWAS) have identified >1,200 signals associated with type 2 diabetes (T2D), yet identifying functional variants remains challenging because the majority of them lie in noncoding regions of the genome and are in areas of high linkage disequilibrium (LD). While chromatin accessibility QTL (caQTL) and expression QTL (eQTL) analyses are useful for nominating regulatory mechanisms underlying GWAS signals, limitations still exist in pinpointing functional variants within regions of high LD. A complementary approach that has been less frequently applied is to focus on the allele-specific effect on chromatin accessibility at heterozygous single-nucleotide polymorphisms (SNPs), hereafter referred to as allelic imbalance. We analyzed the allelic imbalance of reads generated from an assay for transposase-accessible chromatin with sequencing (ATAC-seq) across genotyped samples from 490 donors in T2D-relevant tissues: skeletal muscle, liver, pancreatic islets, adipose tissue, and relevant cell types. We identified 119,949 allelically imbalanced SNPs (FDR<0.05) across the genome. The allelic imbalance was often most prominent in one tissue and showed an enrichment overlapping with tissue-specific transcription factor (TF) binding footprints. Focusing on the 8,581 SNPs in previously published 99% credible sets from 338 T2D GWAS signals, we identified 256 imbalanced SNPs across 123 (36.4% of) signals, each showing allelic imbalance in at least one tissue or cell type. Of these, 71 signals contained only a single imbalanced SNP, representing excellent candidate causative variants. As a proof-of-concept, we showed that 23 of the 256 imbalanced SNPs were supported by allelic assays from previous studies. Further, we experimentally validated two imbalanced SNPs as likely functional variants: rs34584161 among a seven-SNP T2D credible set at the RNF6 signal in islets and rs849134 among a 13-SNP credible set at the JAZF1 signal in liver. This study demonstrates the power of integrating ATAC-seq allelic imbalance (ASAI) with GWAS statistical fine-mapping to identify candidate functional regulatory variants from among tightly linked GWAS variants in disease-relevant tissues. While applied here in T2D, this approach represents a widely applicable high-throughput framework for refining the genetic architecture of complex traits.

3

Empirical estimation of multiple-testing burden for population-based HLA association studies using sequencing-derived HLA alleles across genetic ancestries

Taliun, D.; Gagliano Taliun, S. A.

2026-07-15 genetics 10.64898/2026.07.12.738059 medRxiv

Top 0.7%

7.8%

Show abstract

As population-scale whole-genome sequencing datasets continue to expand, they enable genetic association studies beyond single-nucleotide variants to more complex forms of genetic variation, including classical human leukocyte antigen (HLA) alleles. The HLA region comprises nine highly polymorphic classical HLA genes in extensive linkage disequilibrium that are associated with numerous autoimmune and infectious diseases. However, unlike genome-wide association studies of single-nucleotide variants, there is no general guidance for controlling the multiple-testing burden in HLA allele association analyses. Here, we systematically evaluated the effective number of independent HLA allele tests using sequencing data from diverse genetic ancestries, analytical derivation and simulations. We show that the multiple-testing burden depends on genetic ancestry, allele frequency, and the phenotype model, but remains remarkably stable across minor allele count thresholds, corresponding to approximately 60-70% of the total number of tested HLA alleles. Simulations further demonstrate that the effective number of tests can exceed 90% under realistic disease models. Analyses of 4-field HLA alleles from long-read sequencing showed that higher typing resolution increases the number of alleles but preserves the underlying correlation structure and scales the effective number of independent tests proportionally. Our results provide practical guidance for HLA association studies and support Bonferroni correction based on the total number of tested HLA alleles as a simple and robust approximation when permutation-based approaches are impractical.

4

Human inherited RORgammaT deficiency encompasses genetic heterogeneity, T cell deficiency, and clinical homogeneity

Fagniez, I.; Tsumura, M.; Guerin, A.; Abolhassani, H.; Sharafian, S.; Mesdaghi, M.; Nishimura, T.; Prasada, H.; Rao, S.; Richards, S.; Han, J. E.; Delmonte, O. M.; Kergaravat, C.; Markle, J. G.; Ogishi, M.; Han, J.; Peel, J.; Vellutini, J.; Feng, Y.; Soudee, C.; Migaud, M.; Palterer, B.; Jackson, K. J. L.; Nishimura, S.; Sakata, S.; Kinoshita, K.; Yamamoto, A.; Moritake, H.; Alzahrani, M.; Vallejos, F.; Cole, T.; Smart, J.; Choo, S.; Chavoshzadeh, Z.; Arman, S.; Toubert, A.; Zhang, P.; Rosain, J.; Notarangelo, L. D.; Pan-Hammarstrom, Q.; Tangye, S. G.; Casanova, J.-L.; Ma, C. S.; Puel, A.; Bus

2026-07-20 allergy and immunology 10.64898/2026.07.18.26358075 medRxiv

Top 0.8%

7.0%

Show abstract

We previously reported inherited RORgammaT deficiency in seven patients from three ancestries (Chilean, Palestinian, Saudi Arabian) with mycobacterial disease and chronic mucocutaneous candidiasis (CMC). We report here five additional patients from different ancestries (Afghan, Indian, Iranian, Japanese, Sri Lankan), each homozygous for a new loss-of-function RORC variant. All but one patient, the exception receiving early prophylaxis, developed mycobacterial disease due to a near-complete depletion of innate-like adaptive T cells, including MAIT and iNKT cells, low counts of adaptive TH1* and CD8+ T cells, and impaired Mycobacterium-induced IFN-gamma production by the remaining cells of these subsets, NK cells, conventional CD4+ T, Vdelta1, and Vdelta2 gamma-delta T cells. Most patients also displayed CMC due to their low counts of TH17 and TH1* cells. One patient died from disseminated Bacille Calmette-Guerin (BCG) vaccine infection, but, unexpectedly, all the other patients are still alive and clinically stable. RORgammaT is essential for protective immunity against mycobacteria and Candida in humans.

5

The effect of genome organisation on selection efficiency in two contrasted plant species

James, J.; Lascoux, M.

2026-07-15 evolutionary biology 10.64898/2025.12.19.695387 medRxiv

Top 0.8%

7.0%

Show abstract

Does the distribution of fitness effects of new mutations vary across the genome? Under the classical Fisher Geometric Model (FGM) we might not expect it to. In FGM, phenotypic traits are envisioned as dimensions of a landscape, with fitness determined by position in the landscape, i.e., the particular combination of traits of an individual. New mutations are represented by vectors that move from an ancestral to a new phenotype. In classical FGM these vectors affect all trait dimensions simultaneously (universal pleiotropy). However, introducing partial and modular pleiotropy into an FGM framework leads to an expectation that parameters of the DFE will vary with mutational pleiotropy-the number of traits affected by individual mutations. Here we address this prediction by investigating whether traits related to mutational pleiotropy, expression level and network connectivity, affect the parameters of the DFE using whole genome data from A. thaliana and C. grandiflora, two closely related Brassica species that vary significantly in their demography and mating system, and therefore, in effective population size and the effects of linked selection. Results were similar across both species. We found that expression level and network connectivity were predictive of the parameters of the deleterious DFE, even once co-correlations among genome biology traits were accounted for. Our results suggest that, across the genome, molecular evolutio(high mutational pleiotropy). nary patterns agree with the predictions of FGM, albeit relaxing the assumption of universal pleiotropy, and that variation in mutational pleiotropy among genes is sufficient to have detectible effects on the DFE. Significance statementHow do the effects of new mutations vary across the genome? If mutations in some genes affect many traits (high mutational pleiotropy), we hypothesise they will be more strongly deleterious, with lower variance in their selective effects. We test this by investigating the distribution of effects of new mutations across genes that vary in features that are related to mutational pleiotropy: expression level, gene network connectivity, and number of associated GO terms. The mean strength and coefficient of variation of selection of new mutations varied across genes with different features in the manner expected by our hypothesis. This demonstrates that important parameters of molecular evolution can vary across the genome with genome architecture.

6

The Variance-Stabilizing Transformation for the Poisson Rate Ratio: Closed-Form Confidence Intervals

Ng, S.-P.

2026-07-18 epidemiology 10.64898/2026.07.16.26358255 medRxiv

Top 2%

3.2%

Show abstract

The incidence rate ratio R is the standard measure for comparing event rates in clinical trials and epidemiology. In vaccine trials, the vaccine efficacy is VE = 1 - R. When events are rare, the two arm counts are Poisson. The estimator of R is heteroskedastic: its sampling variance changes with the data. So no fixed-width interval covers correctly everywhere. The usual log-Wald interval is undefined at zero events and covers poorly at small counts. Early vaccine and drug-safety readouts fall in exactly this regime. We show that a single reparameterization collapses this bivariate problem to an effective one-parameter family with a quadratic variance function, whose variance-stabilizing transformation is 2 arcsinh(sqrt(R)). The reduction yields a closed-form confidence interval for R. Its two leading errors, a curvature bias and the variability of the estimated scale, each admit a closed-form correction with no tuning constants. In a Monte Carlo study of our seven arcsinh variants and five competitors, the +Curve+Stu variant covers within 0.002 of the nominal 0.95 for about 50 control and 5 treatment events. Its width is on par with the best competitor. It avoids the conservatism and zero-count breakdown of log-Wald and MOVER. For moderate counts, we recommend this interval; for sparser data, our Bar-Lev and Enis count-shift variant is more robust. The result is a ready-to-use, closed-form interval for the low-count regime. We illustrate it on early Covid-19 vaccine-efficacy readouts and provide reference implementations in R and Python.

7

NFIX missense variants that disrupt the β-hairpin loop result in a severe form of Malan syndrome in adolescence with rapidly evolving scoliosis and muscle wasting

Delagrammatikas, C. G.; Gourlay, L. J.; Priolo, M.; Russo, R.; Ahmadi, A.; Barbiroli, A. G.; Capelli, R.; Stowers, K.; D'Annibale, O.; Ravalin, M.; Tartaglia, M.; Nardini, M.; Cocanougher, B. T.

2026-07-19 genetic and genomic medicine 10.64898/2026.07.16.26357549 medRxiv

Top 3%

1.3%

Show abstract

Purpose: Pathogenic variants in NFIX cause Marshall-Smith syndrome and Malan syndrome (MALNS). We identified a severe subtype of MALNS characterized by adolescent-onset musculoskeletal deterioration and investigated functional consequences of underlying variants. Methods: Clinical data were collected from seven individuals with pathogenic NFIX variants. Wild-type and mutated recombinant NFIX DNA-binding domains (DBDs) were evaluated using biochemical, structural, and DNA-binding assays. Results: Six individuals carrying R116W, R116P, K125E, or G147E NFIX substitutions developed progressive muscle wasting, markedly reduced body mass index, and rapidly progressive scoliosis after the typical childhood features of MALNS; two died from disease-related complications. A seventh individual with R116G did not develop this severe phenotype. Functional studies on recombinant NFIX DBDs showed complete or near-complete loss of DNA-binding activity for R116W, R116P, K125E, and G147E despite preserved protein folding, consistent with disrupted DNA recognition and a potential dominant-negative mechanism. In contrast, R116G exhibited a 7.7{degrees}C decrease in thermal stability, which may support haploinsufficiency mediated by protein degradation. Conclusion: Specific NFIX missense variants define a severe subtype of MALNS associated with progressive musculoskeletal deterioration. In vitro functional studies support variant-specific disruption of DNA binding, providing a mechanistic basis of genotype-phenotype correlations and informing prognosis, clinical surveillance, and therapy development.

8

European-derived coronary artery disease polygenic scores over-flag genetic risk in Vietnamese and Southeast Asian populations: a multi-score analysis in 1000 Genomes

Hoang, Q. P.; Le, T. X.; Doan, D. D.

2026-07-15 genetic and genomic medicine 10.64898/2026.07.10.26357796 medRxiv

Top 3%

1.1%

Show abstract

Background. Polygenic scores (PRS) for coronary artery disease (CAD) are derived almost entirely from European-ancestry data. Their portability to Southeast Asian populations, including the Vietnamese, is largely uncharacterised and clinically consequential when scores are used with risk thresholds. Methods. We evaluated four independent European-derived CAD scores from the PGS Catalog (PGS000058, PGS000349, PGS002809, PGS004198; 70 - 5,723 variants) in 2,504 individuals from the 1000 Genomes Project, focusing on the Vietnamese Kinh (KHV) and Dai (CDX) samples. Per-individual scores were computed with PLINK2 and standardised. We assessed (i) the cross-ancestry distribution (calibration) and (ii) a clinically-relevant consequence: the proportion of each population flagged high genetic risk when the European top-20% threshold is applied (20% if perfectly calibrated). Results. For the primary score (PGS000058) the standardised PRS differed across super-populations (ANOVA F(4, 2499) = 121.1, p < 0.001); the Vietnamese Kinh mean was +0.47 SD above the European mean (Welch t = 7.77, p = 2.0 x 10^ -14). Applying the European top-20% high-risk threshold, the fraction of Vietnamese Kinh flagged ranged from 22.2% to 57.6% across the four scores, and of Dai from 21.5% to 43.0%, versus the intended 20%. Three of the four scores over-flagged Vietnamese (25-58%); the largest score (PGS004198) was approximately calibrated for East/Southeast Asians ([~]22%) but markedly over-flagged Africans (69.3%). Conclusions. European-derived CAD polygenic scores are inconsistently calibrated in Vietnamese and other Southeast Asian samples, and most substantially over-flag high genetic risk when a European threshold is applied. The magnitude and even the direction of miscalibration depend on the specific score, so no such score can be assumed transferable without local validation and recalibration. Distribution shift bounds, but does not by itself quantify, loss of predictive accuracy, which requires phenotyped data.

9

PORTRAIT: a calibrated patient Passport with built-in refusal - describing individuals against a reference population

Oehring, D.

2026-07-15 health informatics 10.64898/2026.07.13.26357968 medRxiv

Top 4%

1.1%

Show abstract

Background Averagebased summaries serve individual patients poorly PORTRAIT is a calibrated abstentionaware tool that describes where one patient sits relative to a reference population across 12 cardiometabolic markers how confident that placement is and which features drive it PORTRAIT describes it does not diagnose or predict Abstention is a designed feature given the known limits of conditional coverage Methods Conformal calibration was combined with distributionfree coverage bounds quantileregression coordinates and copulabased joint structure A frozen reference cohort n9421 supplied fixed calibration a heldout cohort n2247 tested transportability across six strata A release gate required the minimum perslice coverage to hold across 4 of 5 seeds Coverage was retested under survey weighting to the US adult population Coherence was reported as a descriptive joint coordinate Discrimination was summarised with Harrells C and multiplicity controlled by BHFDR Interface conformance was assessed against defined requirements Nielsen heuristics and WCAG 22 AA with attention to automation bias and riskgraph design Results The frozen reference held all six strata within band 08640903 at abstention 0113 whereas a resplit undercovered to 071 at abstention 0227 coverage survived survey weighting The release gate passed on 4 of 5 seeds at abstention 0101 against a nominal 090 and in the frozenreference configuration that ships all six strata held inside the calibrated band 08640903 Coherence showed orthogonality 0444 to raw extremity and correlated 0892 with a copulaMahalanobis distance while remaining deliberately nonidentical so it adds perfeature information Two transfer tests returned negatives the ocular transfer did not hold coverage at thinn Adding coherence changed mortality discrimination by deltaC 00047 Interface requirements moved from 142718 to 38147 METPARTIALUNMET Nielsen severity resolved 7 of 10 issues WCAG 22 AA text criteria passed Conclusions PORTRAIT situates a patient against a frozen reference holds coverage under survey weighting to the US adult population and abstains when calibration cannot be supported The headline result is that the frozen reference held coverage where a resplit did not

10

LocusBlend: Flexible multi-index regional visualization of genomic association signals

yang, c.; Cook, N.; Zeng, Y.; Fu, T.; budde, J.; Cruchaga, C.; Belloy, M. E.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.15.26358129 medRxiv

Top 4%

1.1%

Show abstract

Summary It has become standard practice to visualize regional signals from genomewide association studies GWAS using LocusZoom plots Similarly GWAS signals are compared to regionally matched quantitative trait loci QTLs ie varianttogene regulation data using LocusCompare plots to aid assessment of candidate traitrelated genes Despite broad usage these tools annotate variants by linkage disequilibrium LD to a single lead or index variant This singleindex representation has limitations for visualizing complex loci that contain multiple independent signals We present LocusBlend an interactive web application for multiindex LDblended visualization of genomic loci LocusBlend supports one or two genomic association summarystatistic datasets and one to three index variants multiindex LocusZoom colorblended plots and matching LocusCompare visualizations Applications to Alzheimers disease GWAS and QTL signals illustrate LocusBlend enables visualization and separation of independent signals despite shared LD and high genomic complexity Overall LocusBlend is aimed at supporting researchers handle the continuously expanding complexity of human genomics findings Availability and Implementation LocusBlend is freely available at httpslocusblendwustledu Publication ready plots are generated in 1min Source code documentation example datasets input templates and reproducibility instructions are available at httpsgithubcomBelloyLabLocusBlend LocusBlend is implemented in Python using Streamlit Plotly and PLINK Supplementary Information Supplementary data are available online

11

Predictors of Pregnancy-Related Anemia: A Logistic Regression Study at a Maternity Facility in Ghana.

Kusi, R. Y.; Anyan, F. Y.; Agyekum, G. O.

2026-07-19 obstetrics and gynecology 10.64898/2026.07.16.26358280 medRxiv

Top 4%

1.0%

Show abstract

Background: Anemia during pregnancy remains a major public health concern, particularly in low- and middle-income countries, where it contributes substantially to maternal and neonatal morbidity and mortality. Identifying women at increased risk is essential for timely intervention and improved pregnancy outcomes. Objective: This study aimed to identify the significant predictors of anemia among pregnant women using logistic regression and to evaluate the association between selected clinical and sociodemographic characteristics and anemia. Methods: A cross-sectional study was conducted using secondary data obtained from a community maternity care facility in the Suame Municipality of the Ashanti Region, Ghana. Pregnant women who attended antenatal care during the study period and had complete information on hemoglobin concentration and relevant predictor variables were included. Women with missing hemoglobin measurements at registration or delivery were excluded. Logistic regression analysis was performed to identify independent predictors of anemia. Additional analyses examined the effects of age and weight, as well as the relationship between sickle cell status and blood group. Results: Logistic regression identified diastolic blood pressure, height, hemoglobin concentration at registration, maternal weight and gestational age as significant predictors of anemia during pregnancy (p < 0.05). Although employment status was statistically significant in the model, its direct association with anemia was relatively weak. Maternal age was not significantly associated with anemia. Pregnant women with sickle cell disease had a significantly higher likelihood of anemia. Blood group did not demonstrate a significant relationship with anemia. Effect sizes and confidence intervals were not available in the dataset. Conclusion: Diastolic blood pressure, height, hemoglobin concentration at registration, maternal weight, gestational age, sickle cell status and employment status were identified as important predictors of anemia during pregnancy. These findings highlight the importance of incorporating both clinical and sociodemographic characteristics into antenatal risk assessment and screening programs. Further prospective studies with larger sample sizes and more comprehensive clinical measurements are recommended to validate these findings and strengthen predictive models for anemia during pregnancy.

12

Risk Screening in a Medicaid-Managed Pregnancy Medical Home: The Need to Center Maternal Health Outcomes in Public Health Programming

Dissanayake, M. V.; Mallampati, D. P.; Vladutiu, C. J.; Menard, M. K.

2026-07-19 obstetrics and gynecology 10.64898/2026.07.16.26358262 medRxiv

Top 4%

0.9%

Show abstract

Background: North Carolina Medicaid implemented the Pregnancy Medical Home program to improve access to high-quality maternity care and reduce the risk of adverse perinatal outcomes. Program recipients receive a prenatal risk screening form, originally intended to identify those at high risk of preterm birth and low birth weight, that includes an assessment of social and clinical factors. While prior studies have evaluated whether risk screening can identify pregnancies with higher risk of adverse neonatal outcomes, less is known about the relationship between programmatic risk-stratification and adverse maternal outcomes. Objective: To assess the use of a prenatal risk screen among pregnant Medicaid beneficiaries to identify those at risk of an adverse maternal event. Study design: Linked Medicaid hospital claims, live birth records, and risk screen data from the Pregnancy Medical Home program were used to identify risk factors for adverse maternal events among individuals who gave birth to a liveborn infant in North Carolina between 2014 and 2019. Only those with completed risk screens (75%) were included in the analysis. We used random forest classification to select variables for a multivariable prediction model. We used Poisson regression to model the association between adverse maternal events and selected demographic, psychosocial, clinical, and historical pregnancy characteristics. Adverse maternal events occurring at birth and up to six weeks postpartum included severe maternal morbidity, maternal intensive care unit admission, prolonged birth hospitalization, and postpartum readmissions. Results: A total of 205,916 births met inclusion criteria for this analysis. During the study period, 3.0% of Medicaid beneficiaries had an adverse maternal event occurring between birth and up to six weeks postpartum, including, 0.6% with severe maternal morbidity, 0.9% with an intensive care unit admission at birth, and 1.5% with a prolonged birth hospitalization or postpartum readmission. Maternal age greater than 25 years, Black race, being overweight or obese, smoking, chronic diseases (diabetes, hypertension, mental illness), and pregnancy history characteristics (nulliparity, history of preterm birth, history of hypertensive disorders of pregnancy or gestational diabetes) were associated with an increased risk of adverse maternal events. Modeled together, however, risk factors from the risk form were poorly predictive of the composite outcome. The final model had an Area Under the Curve (AUC) of 0.63 with an optimal sensitivity of 56% and specificity of 63%. Conclusion: Care management during pregnancy is an increasingly relevant topic in public health and prenatal care in the United States. The North Carolina Pregnancy Medical Home is a long-standing and robust Medicaid program that can serve as a model for design and implementation. While this program has effectively designed risk-stratification to identify pregnant people at risk of poor neonatal outcomes who benefit from care management, the risk screen poorly identifies pregnant people at risk of adverse maternal outcomes. Care coordination programs are often designed to optimize neonatal outcomes, and this study highlights the need to center and balance maternal health along with neonatal outcomes to address the needs of a very vulnerable population.

13

Real World Fertility Evaluation & Care Prior to In Vitro Fertilization: Care Gaps That Could be Addressed by Restorative Reproductive Medicine

Parnell, T. A.; Minjeur, M.; Turczynski, C.; Pistilli, T.

2026-07-15 obstetrics and gynecology 10.64898/2026.07.13.26357941 medRxiv

Top 5%

0.6%

Show abstract

Objective To evaluate adherence to published American Society for Reproductive Medicine (ASRM) infertility evaluation and treatment recommendations among commercially insured infertility patients who subsequently underwent in vitro fertilization (IVF) and to assess whether observed care gaps support the need for a restorative reproductive medical framework. Methods A retrospective claims-based analysis was performed using MarketScan(R) Commercial Claims and Encounter Data between January 1, 2021, and December 31, 2024. Approximately five million commercially insured members were evaluated. Patients with infertility-related diagnoses who subsequently underwent IVF were identified. Claims were analyzed for evidence of diagnostic testing, medical treatment, or surgical intervention recommended by ASRM or AUA/ASRM guidance before IVF initiation. Cumulative adherence rates were assessed over nine months following initial infertility diagnosis. Results IVF initiation rose early and consistently exceeded completion of nearly all guideline-recommended evaluations and treatments. Observed care gaps ranged from approximately 13% to 78% for most recommended evaluations and treatments, with several measures demonstrating gaps exceeding 50 percentage points, suggesting substantial divergence between guideline recommendations and observed clinical practice. By 3 months, IVF initiation ranged from 28% to 39% across cohorts, while adherence to many recommended interventions remained low. Overall, by 9 months, IVF utilization commonly exceeded 70-85%, while many guideline-supported evaluations and treatments remained below 40% adherence, with several interventions remaining below 15%. These findings suggest substantial divergence between published infertility-care recommendations and observed pre-IVF practice patterns. From an RRM perspective, the gaps are clinically important because many recommended steps are directed toward identifying, correcting, restoring, or preserving reproductive function and anatomy before reproductive barriers are bypassed through IVF. Conclusions Many commercially insured infertility patients appeared to progress to IVF without documented evidence of diagnostic evaluation or therapeutic intervention recommended in ASRM and AUA/ASRM guidance. These findings raise important questions regarding the implementation of infertility guidelines before IVF and the extent to which patients receive meaningful opportunities for diagnosis-directed treatment of potentially reversible causes of infertility. The findings further suggest an important role for restorative reproductive medicine as a quality-of-care framework focused on comprehensive evaluation, correction of underlying dysfunction, preservation of reproductive anatomy and physiology, and optimization of patient-centered fertility care prior to attempts with assisted reproduction.

14

Sex-different phenotypic correlations: Due to genes or environment?

Fritz, A.; Darrous, L.; Bonnelykke, K.; Pedersen, A. G.; Kutalik, Z.

2026-07-15 genetic and genomic medicine 10.64898/2026.07.13.26357694 medRxiv

Top 5%

0.5%

Show abstract

Differences in physical features and disease prevalence between men and women are examples of sexual dimorphisms. However, sex differences can manifest not only in trait means but also in how strongly risk factors are linked to diseases (e. g. BMI to cardiovascular disease), a question heavily under-researched. To fill this gap, we set out to identify sex differences in phenotype correlations (rP) and decompose them into genetic (rG) and environmental (rE) contributions. Our analysis revealed 250 trait pairs with significant sex-different phenotypic correlations in the UK Biobank. Overall, we observed a predominance of environmental contributions to sex-different effects: 182 trait pairs (73%) exhibited exclusively sex-different rE, while 68 (27%) showed sex differences in both rE and rG, and no trait pair was affected solely by sex-specific rG. For example, we detected sex-different environmental correlation between C-reactive protein and BMI (rE(men) = 0.07 vs rE(women) = 0.25), but no sex-difference in genetic correlation. On the contrary, glycated haemoglobin and LDL cholesterol showed genetic correlation only in women (rG(women) = 0.17; 95% CI = [0.1, 0.23]), but environmental correlation only in men (rE(men) = -0.18; 95% CI = [-0.19, -0.16]). Some of the observed sex differences - including those involving testosterone, SHBG, urate, waist-hip ratio, and triglycerides - may reflect underlying sex-specific genetic architectures, as evidenced by low between-sex genetic correlations. In conclusion, environmental factors are the predominant contributors to sex differences in phenotypic correlations between complex traits, with modest detectable contributions from sex-specific genetic architectures. Recognising these patterns can inform the development of more effective, sex-informed interventions.

15

Privacy-Preserving Matching for Federated Causal Inference in Multicentre Patient Cohorts

Gusinow, R.; Morgan, A. S.; Canziani, L. M.; Zeitlin, J.; Kim, M.; Gentilotti, E.; Ghosn, J.; Florence, A.-M.; Tami, A.; Toschi, A.; Palacios-Baena, Z. R.; Tacconelli, E.; Hasenauer, J.

2026-07-19 epidemiology 10.64898/2026.07.16.26358171 medRxiv

Top 5%

0.5%

Show abstract

Causal effect estimates can often be biased in clinical and epidemiological studies as patient cohorts frequently exhibit substantial covariate imbalances between treated and control groups, often amplified in multicentre studies due to heterogeneous recruitment, clinical practice, and case mix. Covariate balancing methods are therefore essential for valid causal inference. However, their application becomes challenging when data are distributed across cohorts and cannot be pooled because of privacy, legal, or institutional constraints, leaving a gap in practical methods for causal effect estimation in federated and imbalanced clinical data settings. We develop a privacy-preserving framework for covariate balancing and causal effect estimation across distributed data providers, combining federated aggregation with differential privacy to enable propensity score subclassification and matching without sharing individual-level records. Matching relies on non-disclosive quantities and differentially private distance evaluation, and the resulting matched subsets remain local to each server. Balance can be assessed through federated diagnostics and privacy-preserving visualisations, and we provide secure estimators for average treatment effects with associated uncertainty quantification. We implement this framework in the DataSHIELD federated analysis platform via 2 R packages. In simulations, we demonstrate agreement between federated and centralised analyses in the absence of privacy noise and quantify the bias--variance trade-offs induced by differential privacy. We illustrate applicability in two multinational settings-a Long COVID cohort and very preterm birth cohorts-showing that the approach enables practical causal analyses under real-world data protection constraints. The DataSHIELD packages are available on Github. Additional methodological details are provided in the Supplementary Material.

16

Reliability-weighted target prioritization in CD4+ T-cell Perturb-seq: a generalizability-theory decomposition

Cheng, C.

2026-07-15 bioinformatics 10.64898/2026.07.13.738312 medRxiv

Top 6%

0.5%

Show abstract

Genome-scale Perturb-seq screens prioritize candidate targets by the strength of a perturbations transcriptional effect. Effect strength does not answer a prior measurement question: is the readout dependable? A large effect estimated from a single guide, a single donor, or a pseudobulk of few cells need not survive replication, and for target prioritization each false lead costs a validation experiment. We treat each perturbation effect as a measurement in a crossed Target x Guide x Donor x Condition design and apply generalizability theory (Brennan, 2001; Cronbach et al., 1972) to separate the dependable part of an effect from facet-specific idiosyncrasy. Guides and donors enter as random facets; condition enters as a fixed facet and is analyzed within its levels. For each target we report a dependability profile over the facets and a joint generalizability coefficient over the two random facets, and we re-rank targets by effect magnitude weighted by that coefficient. On the released screen (Zhu et al., 2025), removing the measurement-error floor estimated from the non-targeting controls raises the number of genes with a dependable target-signal share above .10 from 40 to 7,674. Analyzed within activation states, dependability recovers the T-cell-receptor signaling module as reliably measurable only in activated cells, without recourse to gene annotation. A design study indicates that reliability is limited by the number of guides rather than the number of donors, so a future screen should add guides. Every methodological decision was recorded and adversarially reviewed, and all results regenerate from the released summary statistics.

17

PRANA: A Deep Learning Method for Adapting Polygenic Risk Scores to Diverse Ethnic Groups

Levi, H.; The Breast Cancer Association Consortium, ; Michailidou, K.; Elkon, R.; Shamir, R.

2026-07-15 genetic and genomic medicine 10.64898/2026.07.12.26357860 medRxiv

Top 6%

0.5%

Show abstract

Polygenic risk scores (PRSs), which quantify inherited susceptibility to complex traits and diseases, have emerged as valuable tools for risk stratification and precision medicine. Despite their promise, PRS developed on European cohorts often demonstrate substantially reduced predictive accuracy in non-European populations, due to differences in genetic architecture. The disproportionate representation of European ancestry cohorts in genome-wide association studies (GWAS) leads to inequitable deployment of PRS technologies across diverse populations. Here, we introduce PRANA (Polygenic Risk Adaptation via Neural-network Architecture), a deep learning framework that adapts an existing PRS developed on one population to other ancestries. Unlike methods that require large-scale GWAS in the target population, PRANA leverages pre-trained PRS models derived from European cohorts and adapts them using modestly sized cohorts from the target population. We evaluated PRANA on seven complex traits in South Asian, East Asian and Ashkenazi Jewish populations, as well as in selected smaller East Asian subpopulations where the scarcity of training data poses a particular challenge. PRANA mostly improved predictive performance of the baseline PRS models by 5%-20% in terms of effect size and Nagelkerke's R^2, and, in most cases, outperformed existing cross-ancestry multi-PRS approaches. These results highlight PRANA as a scalable and practical strategy to reduce disparities in genomic risk prediction and advance the equitable application of PRS in diverse populations.

18

Multi-trait evaluation of a tomato MAGIC population identifies promising lines with improved nitrogen use efficiency (NUE)

Baraja-Fonseca, V.; Gil-Villar, D.; Bancic, J.; Renau-Morata, B.; Salud Justamante, M.; Plazas, M.; Gramazio, P.; Vilanova, S.; Perez-Perez, J. M.; Granell, A.; Molina, R. V.; Nebauer, S. G.; Prohens, J.; Arrones, A.

2026-07-15 plant biology 10.64898/2026.07.14.738388 medRxiv

Top 6%

0.5%

Show abstract

Nitrogen-use efficiency (NUE) is a pivotal breeding target in tomato (Solanum lycopersicum L.) to sustain production under reduced N inputs. Here, we leveraged a recently developed tomato multi-parent advanced generation inter-cross (ToMAGIC) population to identify lines with superior performance under reduced N availability. The eight founders and a core subset of 118 ToMAGIC lines were characterized with 10,684 SNP markers and evaluated under optimal (opN, 15 mM) and suboptimal (subN, 8 mM) N supply in an experiment totalling 1,576 plants, generating 48,068 data points across 61 phenotypic variables. Under both N treatments, ToMAGIC lines exhibited transgressive segregation for most traits, confirming the value of this population as a reservoir of untapped variation. Notably, under subN conditions, harvest index (Hi) increased by 29-44%, suggesting adaptive resource redistribution toward reproductive sinks. Variance partitioning revealed that agronomic and NUE-related traits were largely under genetic control, with heritability estimates frequently above 0.80 and broadly conserved across N treatments. Multivariate trait analysis identified fruit yield N concentration (NUE component, CN,y), shoot biomass N content (NAb), and shoot growth-related traits as the main drivers of treatment differentiation. Finally, proxy traits were prioritized by integrating response magnitude, heritability, trait correlations, and treatment-discriminatory power into multi-trait selection indices. This strategy generated favorable predicted genetic gains, reaching 158% for high-performance lines and 170% for subN-adapted lines, and consistently identified lines 402, 428, 518, 800, and 816 as promising pre-breeding materials. Overall, this study supports ToMAGIC as a powerful resource for developing N-efficient cultivars suited for sustainable agriculture.

19

CuGen: A GPU-accelerated framework for large-scale genomics

Kiiskinen, T.; Richland, J.; Wang, W.; Lu, W. S.; Balasubramanian, N.; Hastie, T.; Tibshirani, R.; Rivas, M. A.

2026-07-17 genetic and genomic medicine 10.64898/2026.07.15.26358178 medRxiv

Top 6%

0.4%

Show abstract

Biobank-scale genomic analyses remain computationally expensive, CPU-bound workflows, particularly when adjusting for confounding. Here, we present CuGen, a GPU-accelerated framework for large-scale genomics. CuGen uses UltraLasso, a novel hierarchical application of univariate-guided sparse regression (uniLasso), to select a compact, phenotype-informed active set of fewer than 30,000 variants. This achieves robust leave-one-chromosome-out (LOCO) confounding control, enabling both downstream GWAS and in-sample fine-mapping. Additionally, we introduce the .cugen file format, a genotype representation designed for memory-optimized, high-throughput streaming and random access on GPU hardware. Building on this substrate, we provide a general GPU-accelerated genomics toolkit handling polygenic prediction, data manipulation, quality control, analysis, and visualization. We demonstrate CuGen's efficacy in the UK Biobank with up to 408,624 individuals, where the full GWAS pipeline and fine-mapping against 6.8 million imputed variants completes in approximately 10 minutes on a single high-throughput GPU with 80 GB of memory. The pipeline scales efficiently to massive phenome-wide analyses with sublinear resource consumption.

20

In Silico Trial Simulation with Artificial Intelligence-Generated Synthetic Control Cohorts Reproduces Results of a Randomized Controlled Trial in Acute Myeloid Leukemia

Kumar Reddy, K.; Hahn, W.; Winter, S.; Roellig, C.; Mueller-Tidow, C.; Serve, H.; Baldus, C. D.; Fransecky, L.; Schliemann, C.; Burchert, A.; Schaefer-Eckart, K.; Kaufmann, M.; Schetelig, J.; Bornhaeuser, M.; Middeke, J. M.; Eckardt, J.-N.

2026-07-16 health informatics 10.64898/2026.07.15.26358123 medRxiv

Top 6%

0.4%

Show abstract

Rising costs, slow accrual and molecular substratification of cancers necessitate novel clinical trial designs. We demonstrate that artificial intelligence-generated synthetic patients can replace real controls to reproduce results of the SORAML trial. Using external multimodal data from 1,377 acute myeloid leukemia (AML) patients from previous trials and a real-world registry, we fine-tuned a tabular foundation model to generate synthetic patients, reproducing clinical and genetic features and outcome associations. Synthetic patients were then matched to the original SORAML intervention group using Cox risk scores, replacing the original control and reproducing the original trial result with near-identical median event-free survival (EFS) and treatment effect (original hazard ratio [HR] 0.64, 95%-confidence interval [CI] 0.47-0.87, p=0.004; with synthetic control HR 0.66, 95%-CI 0.48-0.90, p=0.009). Our findings demonstrate that AI-generated synthetic patients can serve as statistically rigorous controls supporting novel trial designs.