Back

GENETICS

Oxford University Press (OUP)

Preprints posted in the last 7 days, ranked by how well they match GENETICS's content profile, based on 189 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

Hu, S.; Cheng, H.; Gillenwater, L.; Manpearl, K.; Mandava, A.; Wang, Y.; Pividori, M.; Stranger, B.; Krishnan, A.; Greene, C.; Gao, Y.

2026-05-28 health informatics 10.64898/2026.05.26.26354182 medRxiv
Top 0.3%
2.8%
Show abstract

Objective. Biomedical knowledge graphs (KGs) such as PrimeKG, Hetionet, UMLS, and PharmGKB are increasingly used as the substrate for downstream machine-learning, retrieval-augmented generation, drug-repurposing, and electronic health record (EHR) augmentation pipelines. The dominant assumption in published work is that integrating two or more such KGs is a tractable engineering step solved by identifier (ID) matching. This paper interrogates that assumption empirically. We quantify how much concept overlap survives realistic alignment, and we characterize the new failure modes introduced by the methods that practitioners reach for when ID matching is insufficient. Materials and Methods. We compared four widely used biomedical KGs (PrimeKG, Hetionet v1.0, the full UMLS Metathesaurus, and PharmGKB) across eleven node types using a tiered alignment pipeline: (1) direct ID matching for nodes sharing a primary vocabulary; (2) cross-ontology bridging using standard mappings (e.g., MONDO-DOID, HPO-UMLS, HPO-UMLS-MeSH for side effects, NCBI Gene-HGNC-UMLS, UBERON-FMA/SNOMEDCT_US/NCI/MeSH for anatomy); (3) ClinicalBERT cosine-similarity grouping at threshold >= 0.98 for over-segmented disease nodes, with a deterministic suffix-stripping canonicalizer; (4) exact name matching for ontology-poor types (anatomy, REACTOME pathways); and (5) embedding-based fuzzy matching with UMLS lookup (SapBERT and ClinicalBERT) for free-text microbiome concepts. We applied the pipeline to a 698-concept gut-microbiome benchmark spanning taxa, pathways, and disease labels, validated grouping decisions against the curated SSSOM mappings released by the MONDO project, and audited the ClinicalBERT consolidation against five clinical-genetics case studies drawn from the literature. Results. Per-type pairwise coverage was strikingly asymmetric. Genes/proteins and the three Gene Ontology categories aligned cleanly across PrimeKG and Hetionet (mutual coverage 94-99%), but disease overlap was sparse: only 0.7% of PrimeKG individual disease nodes mapped to Hetionet, rising to 2.0% after MONDO grouping (versus 78.7% and 18.4% from the Hetionet side). PrimeKG-to-UMLS coverage spanned 100% (effect/phenotype via HPO) down to 20.8% (REACTOME pathways), with drugs at 73.7% and anatomy at 58.8%. PrimeKG-to-PharmGKB drug coverage required up to two bridging hops (DrugBank -> UMLS -> RxNorm/ATC/MeSH). Bigger was not uniformly more complete: on a 698-concept microbiome drug benchmark, Hetionet missed 0 concepts while PrimeKG missed 16. ClinicalBERT-based grouping consolidated 22,205 raw MONDO disease nodes into 17,080 groups but introduced three reproducible failure modes documented in case studies: (i) peer over-merging: for example, all 22 osteogenesis imperfecta subtypes collapsed into a single node despite distinct severity classes; (ii) parent-child collapse: e.g. acute myeloid leukemia merged with myeloid leukemia, erasing the acute/chronic distinction that drives clinical management; and (iii) lexical false positives: neurofibromatosis and schwannomatosis grouped together despite cellular-pathology differences. Discussion. Identifier matching alone is a weak baseline for biomedical KG integration. Cross-ontology bridges and embedding-based consolidation expand coverage but do so at the cost of clinically meaningful resolution, and the resulting failures are systematic rather than random. Reporting only aggregate coverage statistics obscures these losses, which propagate silently into downstream tasks. Conclusion. We provide reusable per-type coverage tables, a taxonomy of three integration failure modes, and concrete recommendations for downstream studies that depend on a unified biomedical KG. We argue that future KG integration work should report per-type coverage and per-cluster confidence rather than aggregate match rates.

2
Locally adaptive conformal prediction intervals for polygenic score-based phenotype prediction via residual normalization and data-driven stratification

Yun, Y.; Hao, X.; Zhang, Y. D.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.28.26354326 medRxiv
Top 0.9%
1.2%
Show abstract

Quantifying uncertainty in polygenic score (PGS)-based phenotype prediction is crucial for the integration of genomic data into precision medicine. While the PGS provides a fundamental pivot for point estimation, clinical decision-making necessitates the construction of well-calibrated prediction intervals that reliably encompass the true phenotypic values. However, phenotypic residuals are frequently characterized by complex heteroscedasticity and stratified variance structures across diverse demographic contexts. Existing approaches often rely on global calibration mechanisms, which fail to account for such localized variance structures and lead to systematic miscalibration within specific subpopulations. To bridge this gap, we propose Clustering-based Split Conformal Prediction with Normalized Residuals (C-SCNR), a versatile framework based on Split Conformal Prediction. By adopting residual normalization and incorporating a repetitive `split-and-cluster` mechanism, C-SCNR dynamically identifies latent error strata and applies fine-grained adjustments to the resulting intervals. Our framework requires no distributional assumptions regarding the phenotype, is compatible with any PGS method, and flexibly accommodates biologically-informed grouping. Simulation studies demonstrate that our framework consistently outperforms existing methods across diverse error distributions. In real-data applications analyzing Body mass index (BMI), Low-density lipoprotein (LDL) cholesterol, and High-density lipoprotein (HDL) cholesterol in the UK Biobank, C-SCNR effectively resolves the coverage deficiencies of existing methods in specific subgroups and consistently yields superior localized calibration. Overall, C-SCNR represents a flexible and powerful framework for constructing high-resolution context-specific prediction intervals, thereby facilitating more reliable clinical interpretations of polygenic risk.

3
Phenome-Wide Association Study of Pre-Cancer Diagnosis Electronic Health Records Identifies Risk and Inverse Associations in the All of Us Research Program

Rich, C. C. D.; Bang, E. J.; Bair, A. B.; Richardson, B. E.; Millington, J. L.; Bates, B. A.; Davis, M. F.; Bailey, M. H.

2026-05-28 health informatics 10.64898/2026.05.26.26353823 medRxiv
Top 1%
0.7%
Show abstract

Background: The All of Us Research Program represents a rich resource for cancer epidemiology research, with over 400,000 participants with whole genome sequences linked to electronic health records (EHR). Large cancer datasets often focus exclusively on cases without controls and neglect pre-diagnosis healthcare occurrences. Here, we perform a phenome-wide association study (PheWAS) of EHR data at least 1 year pre-diagnosis between cancer cases and matched controls, revealing co-occurring and mutually exclusive phenotypes. Methods: We identified 55,000+ cancer cases across 21 cancer types in All of Us version 8. To eliminate age-related confounding, we implemented a two-stage matching and censoring strategy: loose matching on demographics to establish index dates and cohort comparability, followed by right-censoring of EHR data (excluding 1 year pre-diagnosis/index), then 1:2 matching to address residual demographic imbalance. We tested associations between 23,193 cancer cases, 46,386 matched controls and approximately 1,600 clinical phenotypes using logistic regression adjusted for sex at birth, self-reported race, age at diagnosis/index date, and two censored EHR metrics: observation window and unique condition count, with Bonferroni correction for multiple testing. Results: Our analysis identified 232 significantly associated phenotypes, confirming established cancer risk factors including elevated prostate specific antigen (OR = 2.92, 95% CI: 2.65-3.23; p-value=1.8x10-101) and multinodular goiter (OR = 1.73, 95% CI: 1.56-1.91; p-value=6.7x10-27). Further investigation into the relationship between several phenotypes with seeming inverse effects is warranted. Conclusions: This PheWAS of EHR data at least 1 year pre-diagnosis leveraged the diversity of All of Us to examine how clinical phenotypes prior to cancer diagnosis vary across cancer types and racial groups. Our findings validate All of Us as a robust platform for cancer epidemiology research, confirming established risk factors at scale across diverse populations. This work provides methodological insights for EHR-based susceptibility analyses and demonstrates the value of agnostic phenome-wide approaches for generating hypotheses in precision medicine.

4
Rationale and Design of an Artificial Intelligence Model for Diastolic Heart Failure (AID- HF): A Canadian Cardiomyopathy Collaborative (C3) Study

Papaz, T.; Patel, S.; Akilen, R.; Min, S.; Lesurf, R.; Rouleau, J.-L.; Ruiz, M.; Lam, C. Z.; Dragulescu, A.; Friedberg, M. K.; Mertens, L.; Tremblay-Gravel, M.; Krahn, A. D.; Tadros, R.; Mital, S.

2026-05-29 cardiovascular medicine 10.64898/2026.05.27.26354226 medRxiv
Top 2%
0.7%
Show abstract

Diastolic heart failure (HF) in primary cardiomyopathy is under-recognized and often diagnosed late, particularly in children. While recent studies have advanced understanding of HF with preserved ejection fraction in older adults, the prevalence, outcomes and molecular drivers of diastolic HF in pediatric and young adult cardiomyopathy remain poorly defined, where disease is typically driven by primary myocardial disease rather than acquired co-morbidities. The Canadian Cardiomyopathy Collaborative (C3) was assembled to leverage three of Canadas leading pediatric and adult cardiomyopathy biobank registries. Its flagship initiative, Artificial Intelligence to Model Diastolic Heart Failure (AID-HF), aims to integrate deep phenotyping - including comprehensive diastolic function assessment - with genomics, lipidomics and proteomics and apply machine learning to identify biological and clinical signatures that drive cardiac function and outcomes in cardiomyopathy. Harmonized phenotyping and multiomics protocols across registries will create a uniquely integrated national data resource and enable the goals of AID-HF i.e., earlier diagnosis and new therapeutic targets for diastolic HF in cardiomyopathy.

5
Measuring the Meaning of Genomic Results: Harmonization of the Metric for Case-Level Results in the CSER2 Consortium

Powell, B. C.; Amendola, L. M.; Bonini, K. E.; Crosslin, D.; Desrosiers-Battu, L.; Hiatt, S. M.; Hindorff, L.; Kenny, E. E.; Mavura, Y.; Muenzen Ferar, K. D.; Risch, N.; Roman, T.; Slavotinek, A.; Van Ziffle, J.; Bowling, K. M.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354388 medRxiv
Top 2%
0.7%
Show abstract

Yield of reported results from genetic testing provides a proximal measure of clinical usefulness. While ACMG/AMP guidelines provide representations of uncertainty for individual genetic variant classification, additional factors are considered when determining whether results explain a patient's presentation. To standardize cross-consortium analysis, a working group of the Clinical Sequencing Evidence-Generating Research (CSER2) consortium iteratively identified factors used when contextualizing variant-level results to case-level interpretation (i.e., interpretation of an individual's genetic data with respect to the indication for testing). Sites independently categorized results; complex cases were discussed collaboratively, leading to revision of classification categories. Our metric incorporates factors beyond classification of reported variants. Analogous to variant-level results, "Definitive Positive" and "Probable Positive" represent certainty that results may be clinically explanatory. The category "Inconclusive" applies when results may or may not fully explain the patient presentation, with subdivision into multiple (non-exclusive) subcategories. Cases falling outside all of the other categories are considered "Negative". The overall diagnostic yield by this metric and use of categories for inconclusive results varied by CSER project, in part paralleling study design differences. This case-level categorization provides a meaningful assessment of diagnostic yield, and for inconclusive cases identifies potentially resolvable factors for case resolution.

6
Fisher information matrix computation for joint longitudinal and survival models to support clinical study design and covariate effect assessment

Fayette, L.; Brendel, K.; Mentre, F.

2026-06-01 pharmacology and therapeutics 10.64898/2026.05.28.26354340 medRxiv
Top 2%
0.5%
Show abstract

Joint modelling of longitudinal data using non-linear mixed effects models and time-to-event outcomes provides a suitable framework to account for informative censoring when estimating biomarker dynamics and quantifying event risk using covariates and longitudinal trajectories. Their usefulness in clinical research depends on data collection design, particularly to precisely estimate the association (link) parameter between longitudinal and survival processes. However, optimal design strategies have so far been addressed separately for longitudinal and survival endpoints and remain unexplored for joint models. We propose two Fisher Information Matrix (FIM) computation methods for joint models, relying on Monte-Carlo integration over observations combined with either Markov Chains Monte-Carlo or Adaptive Gaussian Quadrature to integrate random effects. Their accuracy is assessed against clinical trial simulations in an oncological example based on the HORIZON III study with a tumour-growth-survival model including discrete and continuous covariates. We apply these methods to quantify the impact of follow-up duration, sampling richness, sample size, and covariate distribution on parameter uncertainty and test power. In our example, longitudinal-parameter uncertainty is barely affected by follow-up duration or sampling richness, whereas survival-parameter uncertainty decreases substantially from 1-year to 2-year follow-up. The number of subjects needed (NSN) to achieve <15\% uncertainty on the link parameter is comparable for a 2-year rich design and a 3-year sparse design. Optimal covariate distributions are stable across designs and systematically improve test power, outperforming longer and richer but non-optimised designs. These FIM-based methods accurately predict uncertainty and test powers, enabling design evaluation and NSN computation for joint-model-based clinical studies.

7
Closed-Loop Quality Assurance for Production Clinical AI Documentation

Napier, A.; Wiley, J.; Heslin, M.

2026-05-29 health informatics 10.64898/2026.05.27.26353977 medRxiv
Top 2%
0.5%
Show abstract

A closed-loop quality system deployed across thirteen US hospital sites resolved physician complaints with zero regressions on 42 tracked cases across 1,089 optimization iterations, while a deterministic assembly-agent replacement cut H+P trace latency from 19.6 s to 10.8 s (-8.8 s, 95% CI [-10.5, -7.1] s; n = 100 pre, n = 100 post). We report four observations and an architectural follow-through. First, the same binary-check instrument produces opposite outcomes depending on the question asked: "maximize this score" produces structurally-correct notes that physicians reject (Spearman rho = -0.077, 95% CI [-0.40, 0.26], n = 36); "did this specific fabrication stop?" produces rater-invariant deployment decisions. Second, in our pipeline, assembly-stage agents did not respond to prompt optimization the way reasoning agents did: four consecutive optimization attempts produced 18-28 point regressions. Third, physician preference is rater-fragile at typical clinical-AI calibration sample sizes (Cohen's kappa = 0.028 between two board-certified physicians, 95% CI [-0.30, 0.36] on n = 35 overlapping pairs). Fourth, the architectural punchline: six weeks after the prediction, the LLM call at the chart-assembly step was replaced with a deterministic renderer (sub-500-character template plus sandboxed scripting), lifting the defect-free rate on a 51-case holdout from 49% to 84%. We introduce a Pareto-with-absolute-floors acceptance rule (multi-axis commit with severity-class categorical vetoes) as a methodological contribution distinct from scalar-reward acceptance in standard prompt-optimization frameworks. Cross-iteration rejection memory prevents the loop from re-proposing edits already rejected three or more times. A reproducibility bundle (anonymized ablation per-case counts, bootstrap-CI data, analysis scripts) is released under CC BY 4.0 at github.com/sayvant/SQS-Auditor-paper-data.

8
Frontier Large Language Models for Comprehensive Medication Review in CKD Patients with Polypharmacy: A Trap-Embedded Synthetic Benchmark

Chuang, K.-C.; Lin, H.-J.; Lin, H.-M.

2026-05-26 health informatics 10.64898/2026.05.23.26353939 medRxiv
Top 2%
0.4%
Show abstract

Background: Patients with CKD and polypharmacy face high rates of drug-related problems, yet comprehensive medication review remains time-intensive and inconsistently performed. Large language models (LLMs) may augment this process, but existing benchmarks use multiple-choice formats that do not reflect open-ended, nephrology-specific review. We developed a trap-embedded synthetic CKD benchmark and evaluated five current-generation LLMs (GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, Grok 4.1 Fast, DeepSeek R1; tested April-May 2026) for open-ended medication review. Methods: Fifty synthetic CKD cases across three complexity groups (G3a-G3b [n=20], G4 [n=15], G5/G5D/transplant [n=15]) with 8-12 medications and [&ge;]2 embedded clinical traps each were scored against nephrologist-adjudicated gold standards. Each model produced three independent responses per case (temperature 0; 750 total outputs). Primary endpoint was per-case macro F1; secondary endpoints were safety-critical omission rate, PI-adjudicated hallucination rate, and intra-model consistency. Blinded inter-rater reliability for gold-standard item detection was assessed on a 30% sample. Results: Consensus-level macro F1 ranged from 0.41 (Claude Sonnet 4.6) to 0.49 (Grok 4.1 Fast) (Friedman P < 0.001). Phosphate binder timing (11%) and hyperkalemia combinations (33%) were poorly detected across all models. Safety-critical omission rate ranged from 22% to 48% (P < 0.001); PI-adjudicated hallucination ranged from 0% (GPT-5.4) to 54% (DeepSeek R1), including fabricated dose caps and non-existent guideline citations. Blinded reliability for gold-standard item detection was high (kappa = 0.934, n = 92). Conclusions: This nephrology-specific benchmark exposes clinically important LLM blind spots that generic multiple-choice evaluations would not detect. Heterogeneous hallucination and omission rates indicate that model selection and domain-specific guardrails should precede any clinical deployment of LLM-assisted CKD medication review. Prospective validation with real patient data and human comparators is required before deployment recommendations can be made.

9
Personalized clinical reference intervals for routine precision medical care

Zhang, C.; Chen, Y.-L.; Jamilov, A.; Liu, E.; Shree, S.; Lam, B. D.; Foy, B. H.

2026-05-30 health informatics 10.64898/2026.05.28.26354363 medRxiv
Top 2%
0.4%
Show abstract

Most routine clinical markers are interpreted using population-based reference intervals, despite being regulated around patient-specific homeostatic setpoints. This mismatch obscures physiologic shifts, inhibiting detection of early disease signatures. Here, we develop a novel Bayesian inference method that adaptively constructs personalized reference intervals using each patients existing health records. In analysis of >100 million lab tests in >800,000 patients, these personalized intervals can be accurately constructed with only minimal prior data, meaning this method can be applied near universally. We show that across 43 common lab markers, patient setpoints are strongly associated with future morbidity, with signal strength increasing as more test data is collected. Deviation from personalized reference intervals provides strong and novel risk signatures across diverse disease states, including hypothyroidism, hematologic cancers, kidney disease, and pregnancy complications. Importantly, personalized reference intervals capture a different risk signature to existing population-based approaches, with the highest risk patients being those who deviate from both intervals simultaneously. In a targeted clinical use case study of iron infusion, use of personalized reference intervals greatly improved prediction of treatment efficacy and allowed precise tracking of treatment responses. Our results illustrate how existing health records can be used to construct personalized benchmarks for nearly all common clinical tests, driving a new paradigm for precision laboratory medicine.

10
Positive-control Mendelian randomization highlights power constraints in disease-mortality GWAS

Su, C.-Y.; Butler-Laporte, G.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.29.26354472 medRxiv
Top 3%
0.3%
Show abstract

Yang et al. recently published a systematic comparison of genetic effects on disease susceptibility and disease-specific mortality across nine common diseases and seven biobanks, concluding that susceptibility and survival architectures overlap only modestly. This is an important resource, but we argue that the current mortality genome-wide association studies (GWAS) require explicit power calibration before limited overlap can be interpreted biologically. Using two-sample Mendelian randomization (MR) with positive-control exposures, we show that even a well-powered positive control, body mass index (BMI), instrumented by 855 genome-wide-significant variants, produces a clearly detectable effect for heart failure (HF) mortality, with only weaker evidence for chronic kidney disease (CKD) mortality. However, when BMI instruments were stratified into quartiles by exposure-association strength, the heart failure association remained nominally significant only in the two strongest quartiles and was not significant in the two weakest quartiles. Further, using household income as a weakly instrumented socio-economic contrast has insufficient power to detect moderate effects on any disease mortality outcome. These analyses indicate that current disease mortality GWAS may be insufficiently powered to detect shared effects. In contrast, the same BMI instrument set produced large and directionally coherent effects when applied to case-control GWAS of the matched six diseases, with the HF and prostate cancer associations preserved under a within-family BMI sensitivity analysis, and nominal support for CKD. The HF mortality association was also preserved in a within-family BMI sensitivity analysis. Similarly, genetically proxied household income was associated with HF risk in the case-control GWAS despite null associations with disease-specific mortality, consistent with limited power in the mortality GWAS. These findings indicate that the limited BMI-mortality evidence across several outcomes is unlikely to reflect a weak BMI instrument or dynastic artefacts alone and instead supports limited effective power in current disease-mortality GWAS.

11
In vitro splice-switching oligonucleotide rescues aberrant GFM2 pseudoexon inclusion and restores mitochondrial activity

Gross, S.; Birnbaum, R.; Shaul Lotan, N.; Mor-Shaked, H.; Manor, J.; Shaag, A.; Rosenbluh, C.; Levy-Memo, A.; Yanovsky-Dagan, S.; Saada, A.; Harel, T.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354078 medRxiv
Top 3%
0.3%
Show abstract

Background: Biallelic variants in GFM2, encoding mitochondrial elongation factor G2 (mtEFG2), a GTPase involved in the termination stage of mitochondrial translation, cause autosomal recessive combined oxidative phosphorylation deficiency. Noncoding structural variants may be missed by exome sequencing but can disrupt splicing and provide opportunities for variant-specific therapeutic rescue. We investigated the molecular mechanism underlying suspected Leigh syndrome in an infant with mitochondrial disease and evaluated whether splice-switching oligonucleotide (SSO) treatment could correct the pathogenic splicing defect. Methods: The proband underwent exome sequencing followed by short-read and long-read whole genome sequencing. RNA sequencing, reverse-transcription PCR, quantitative PCR, and cycloheximide treatment were used to characterize the effect of the identified intronic duplication on GFM2 splicing and transcript stability. Patient-derived fibroblasts were treated with SSOs targeting the aberrant splice junction. Rescue was assessed by RNA studies, western blotting, and spectrophotometric measurement of cytochrome c oxidase (COX). Results: Whole genome sequencing identified a paternally-inherited GFM2 missense variant, NM_032380.5:c.2195C>T p.(Pro732Leu), in trans to a maternally-inherited 221-nucleotide intronic duplication, NM_032380.5:c.2029-741_2029-521dup. RNA studies revealed a 87-nucleotide pseudoexon, generated by activation of a cryptic acceptor splice site within the duplicated sequence. The resulting transcript harbored a premature termination codon (PTC) and underwent nonsense-mediated decay, as confirmed by cycloheximide rescue. Together with reduced mtEFG2 protein levels on western blot, the findings supported a loss-of-function mechanism. Enzymatic analysis of affected fibroblasts showed reduced activity of the mtDNA-dependent complex IV subunit COX, with preservation of the nuclear-encoded complex II enzyme succinate dehydrogenase and the control enzyme citrate synthase, consistent with impaired mitochondrial translation. A SSO targeting the aberrant intron-pseudoexon junction nearly abolished pseudoexon inclusion, restored correctly spliced GFM2 transcript from the duplication-containing allele, increased mtEFG2 protein levels, and significantly improved COX activity. Conclusions: This study identifies a pathogenic intronic GFM2 duplication that causes mitochondrial disease through pseudoexon activation and nonsense-mediated decay. The findings demonstrate the value of integrated genome and transcriptome analysis for exome-negative mitochondrial disease and provide in-vitro proof of concept that SSOs can restore transcript processing, protein expression, and mitochondrial respiratory-chain function in patient-derived cells.

12
Twelve-Month Outcomes of Intrathecal Vesemnogene Lantuparvovec for Spinal Muscular Atrophy in Children Younger than 24 Months in Low- and Middle- Income Countries

Ngu, L. H.; Mo, Q.; Li, S.; Toh, T. H.; Lee, J. N.; Lim, K. C.; Tehuteru, E. S.; Lestari, R.; Sanguansermsri, C.; Abueita, H.; Gwer, S.; Li, L.; Wang, Z.; Kirmani, S.; Chen, J. X.; Cai, Y. Y.; Zheng, N. N.; Yang, S. Y.; Liang, P. J.; Li, Y.; Lu, M.; Tang, Y.; Li, Y.; Ye, J. Z.; Shi, S. J.; Hong, J. F.; Chen, A. Y.; Zheng, C. K.; Wang, S.; Lim, T.-O.; Lahn, B. T.; Gao, A. T.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.27.26354188 medRxiv
Top 4%
0.2%
Show abstract

Introduction Spinal muscular atrophy (SMA) is a monogenic neuromuscular disease caused by mutations in the survival motor neuron 1 (SMN1) gene. Onasemnogene abeparvovec is a U.S. FDA-approved single-dose gene therapy for SMA. Both its intravenous formulation (Zolgensma, approximately USD 2.13 million per patient) and intrathecal formulation (Itvisma, around USD 2.59 million per patient) are prohibitively expensive, substantially limiting accessibility in low- and middle-income countries (LMICs). We conducted a clinical study of vesemnogene lantuparvovec, an alternative to onasemnogene abeparvovec developed for use in LMIC settings. Methods Sixteen patients with SMA, including 8 with type 1 SMA and 8 with type 2 SMA, received a single intrathecal administration of vesemnogene lantuparvovec. Eleven patients were treated with a low dose (1.5 * 10^14 vg) and five with a high dose (3.0 * 10^14 vg). The primary endpoints were safety and efficacy, assessed by changes from baseline in developmental gross motor milestones according to the World Health Organization criteria. Overall survival was primarily evaluated in type 1 SMA patients. This trial was registered with ClinicalTrials.gov NCT06288230. Results As of the March 2026 cutoff date, 15 of 16 treated patients had completed at least 12 months of follow-up after treatment, while the remaining one type 1 SMA patient died of disease progression at month 6 post-treatment. At 12 months post-treatment, among the surviving 7 patient with type 1 SMA, the median age was 21.6 months (range, 16.1 to 32.3 months). Among the 16 treated patients, the median age at diagnosis was 4.4 months (range, 0.0 to 18.0 months), and the median age at dosing was 10.7 months (range, 2.8 to 22.5 months). All patients experienced at least one AE. Thirty-one AESIs were reported in 13 patients, including hepatotoxicity, thrombocypenia-related events and cardiac events. No patient required prolonged prednisolone prophylaxis. SAEs, including pneumonia, lower respiratory tract infection, upper respiratory tract infection, and haemorrhagic diarrhoea, occurred in 5 of 8 (63%) patients with type 1 SMA and 2 of 8 (25%) patients with type 2 SMA. Two patients with type 1 SMA required invasive ventilation, and one of whom subsequently died. At 12 months post-treatment, 11 of 16 treated patients (69%) gained at least one new WHO motor milestone versus baseline, including 3 type 1 and 8 type 2 SMA patients; one type 2 patient gained six WHO motor milestones and achieved independent walking. Conclusions In patients younger than 24 months of age with type 1 or type 2 SMA, a single intrathecal dose of vesemnogene lantuparvovec was safe and generally well tolerated and was associated with improvements in developmental gross motor milestones compared with outcomes observed among referred but untreated patients. Additional studies are required to further evaluate the long-term safety and efficacy of this gene therapy.

13
Is it time for a paradigm shift? Tailored online video education instead of pretest genetic counseling facilitates high genetic test uptake and informed choice for adults seeking cardiovascular genetic testing

Rivers, B.; Murray, B.; Applegate, C. D.; Tichnell, C.; Gordon, C.; McClellan, R.; Brown, E.; Nunez, K.; Barth, A. S.; Taylor, C. O.; Yanek, L. R.; Day, J.; James, C. A.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354394 medRxiv
Top 5%
0.2%
Show abstract

Background: Pretest genetic counseling (GC) is recommended in conjunction with genetic testing (GT) for cardiovascular (CV) indications, yet access to CVGC is limited leading to delayed GT. Posttest GC could increase GC and GT access but requires efficient pretest education that supports both informed GT decision-making and robust GT uptake. Methods: We developed four indication-tailored online CV genetics education videos and deployed them in a 3-arm randomized trial comparing pretest vs. posttest outpatient CVGC (RESEQUENCE-GC, NCT05422573). Participants were 1:1:1 randomized to pretest video education plus an optional (efficiency arm) or required (flipped arm) phone call with a genetic counselor and planned posttest CVGC or to standard pretest CVGC (SOC arm). Questionnaires administered at baseline and post-education included the CV Multidimensional Model of Informed Choice [MMIC] to quantify GT knowledge and informed GT choice. Results: 389/767 (50.7%) adults aged 18-80 (mean 51.2{+/-}14.9 years) scheduling a first CVGC appointment consented to RESEQUENCE-GC and completed the baseline questionnaire. Efficiency arm participants (video education + optional phone call) were most likely to complete pretest education (134, 97.4% efficiency; 107, 85.6% flipped; 111, 87.4% SOC, p=0.0012) and elect GT (131, 95.6% efficiency; 105, 84.0% flipped; 107, 84.2% SOC, p=0.0036). Few (4, 2.9%) efficiency arm participants requested an optional pretest phone call. Most flipped arm participants (90, 84.1%) had no post-video questions, consistent with the 97 second [IQR: 65s-145s] median call duration. CV genetics knowledge was high post-education (median 8 [IQR 7,8]/8 MMIC items correct). Only video-based pretest education was associated with a significant increase in knowledge (p<0.0001). Nearly all participants made an informed GT choice with no difference between intervention (95.6%) and SOC (90.4%) arms (p=0.074). Conclusions: Tailored, online video pretest education can enhance CV GT uptake, support informed GT decision-making, and be integrated into efficient pretest workflows, suggesting utility in scalable posttest CVGC.

14
Mid-Pregnancy Maternal Leukocyte Telomere Length and Preterm Birth in a Population-Based Hispanic/Latina California Cohort

Garay, O.; Oltman, S.; Bear, R. J.; Lin, J.; Wojcicki, J. M.; Ryckman, K. K.; Jelliffe-Pawlowski, L. L.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.27.26354189 medRxiv
Top 5%
0.2%
Show abstract

Background Preterm birth (PTB) rates among Hispanic/Latina individuals in the United States have risen over the past decade. Data suggests this rise may be driven in part by psychosocial stress. Leukocyte telomere length (LTL), a marker of cumulative cellular aging that shortens under chronic stress, may capture stress-related biological vulnerability, but has not been examined as a potential population-level contributor to PTB in Hispanic/Latina pregnancies. Objective To examine the association between mid-pregnancy maternal LTL and PTB in a population-based Hispanic/Latina cohort. Methods In a case-control study nested within a California singleton birth cohort (n = 436 Hispanic/Latina individuals; 215 PTB, 221 term births), LTL was measured by quantitative PCR from biobank specimens collected from 15 to 20 weeks of gestation. Covariates from linked birth certificate and hospital discharge records were included. Logistic regression estimated ORs and 95% CIs of PTB by LTL examined continuously and by percentile category (<=10th, 11th-89th, >=90th) with and without adjustment for covariates. Results Mean and median LTL did not differ between PTB and term births. LTL at or below the 10th percentile was associated with elevated odds of PTB relative to full-term birth (12.6% versus 4.3%; ORc = 3.2, 95% CI 1.3-7.9), persisting after partial (ORadj1 = 3.2, 95% CI 1.3-8.3) and full covariate adjustment (ORadj2 = 3.4, 95% CI 1.3-9.3). Subgroup analyses showed consistent directional patterns across PTB subgroups and for early term birth (ORadj2 = 5.1, 95% CI 1.5-17.0). Conclusions Mid-pregnancy maternal LTL <=10th percentile was associated with more than three times the odds of PTB, with risk concentrated at the extreme low tail of the distribution. Consistent with a cumulative allostatic load model, markedly short LTL at mid-gestation may reflect elevated stress-related biological risk for preterm delivery. These findings support upstream investment in stress reduction and prospective LTL research in high-burden populations.

15
Operationalizing Eight-Dimensional Patient-Safety Risk Scoring at Scale: A Multi-Model Large Language Model Reliability Study

LIn, H.-M.; Lyu, J.; Wang, I.-L.

2026-06-01 health informatics 10.64898/2026.05.29.26354437 medRxiv
Top 5%
0.2%
Show abstract

Background: Hospital incident risk scoring has long relied on two- or three-dimensional frameworks (Severity Assessment Codes or Risk Priority Numbers),even though root cause analysis standards recognize that clinical risk is multi-factorial. The obstacle has been mainly cognitive: human reviewers cannotreliably score many dimensions across high incident volumes, so richer assessmenthas not been operationalized at scale.Objective: To extend the traditional three-dimensional FMEA to an eight-dimensional patient-safety risk feature framework, to establish a multi-modellarge language model (LLM) extraction pipeline that scores these dimensionsautomatically, and to demonstrate a variance-aware integer optimization (mean-variance integer programming, MV-IP) that provides a reproducible tie-breakingrule for incident prioritization under extraction uncertainty, rather than improvedrisk coverage.Methods: An 8-dimensional framework covering harm severity, potential harm,frequency, detectability, systemic impact, vulnerable populations, regulatoryrelevance, and economic impact was applied to 213 synthetic and 196 realcurated incident narratives. Three independent LLMs (GPT-5.4, Gemini 3.1 Pro, Grok-4.1 Fast) from different provider families extracted structured risk scores.Inter-model consistency was assessed via ICC(A,1). Among coverage-equivalentselections, MV-IP minimized inter-model variance to give a reproducible prioriti-zation rule. An English-language sensitivity analysis was conducted on 31 AHRQPSNet WebM&M cases.Results: On real cases, seven of eight dimensions reached Fair or betterinter-model reliability (ICC(A,1) 0.53 to 0.83); D5 (Systemic Impact) was theexception at Poor reliability (0.275), driven by little between-case variation ratherthan by wide model disagreement. Reliability was not uniform: two dimensionswere Excellent (D1 actual harm 0.834, D8 economic impact 0.782), two Good,and three only Fair, so some dimensions are more readily extractable than others.The same anchors gave broadly similar results on English-language narratives.When deterministic top-K selection returned several equal-coverage solutions(11 on real cases, total inter-model variance 0.205 to 1.274), MV-IP selected theminimum-disagreement set, replacing ad hoc tie-breaking with an explicit rulewithout improving coverage. Bootstrap resampling found 74% to 90% of per-casevariance estimates stable despite the three-model panel.Conclusions: The eight-dimensional framework operationalizes patient-safetyrisk features that quality teams have considered only implicitly, and three inde-pendent LLM families produced reproducible scores on most dimensions ofcurated narratives. Inter-model agreement, however, measures reproducibilityrather than clinical correctness, and high agreement does not by itself establishthat a score is right; the dimensions that are reliably extractable today (notablyD6 and D8) differ from those that are not yet (D5, and to a lesser degree D4 andD7), which has direct implications for incident-reporting form design. MV-IP con-tributes a reproducible, variance-aware tie-breaking rule rather than improvedcoverage. Validation against expert-prioritized RCA lists and deployment on rawinstitutional incident reports remain the next steps toward clinical use.

16
Outcomes of planned caesarean birth compared with planned or actual vaginal birth: an update and expansion of the NICE Caesarean Birth Guideline systematic review NG192

Black, M.; Robertson, C.; Cruickshank, M.; Ekong, A.; Manson, P.; Kemakolam, O.; Steel, O.; Richards, C.; Harshani, P.; Merriel, A.; Devane, D.; Bhattacharya, S.; Williams, D.; Brazzelli, M.

2026-05-30 obstetrics and gynecology 10.64898/2026.05.28.26354321 medRxiv
Top 5%
0.2%
Show abstract

Background Planned caesarean birth (CB) is an increasingly utilised intervention, observed in almost 1 in 6 first-time mothers giving birth in the UK in 2023-24. Outcomes of planned (or actual) CB have been compared with planned (or actual) vaginal birth (VB) in a UK national guideline, but the scope of the comparison does not fully reflect the range of outcomes of interest to stakeholders. This review provides a comprehensive synthesis of outcomes of planned or actual CB with planned or actual VB to shape information resources which support informed birth planning. Methods The UK NICE Caesarean Birth Guideline NG192 evidence review of outcomes associated with planned CB (or actual CB where no planned CB data was available) was updated and expanded to incorporate additional outcomes prioritised by stakeholders. Results A total of 33 new study reports were combined with 32 reports previously included in NG192. All new reports were observational cohort studies or systematic reviews at low risk of bias. Only 3 studies reported outcomes of planned CB compared with planned VB (regardless of actual mode of birth), whereas all remaining studies reported actual VB outcomes. Planned CB was followed by more maternal infection (wound infection, mastitis, endometritis and urinary tract), venous thrombosis and lower neonatal unit admission rates than a planned VB. In the long-term, CB was linked to one or more sexual problems (insufficient lubrication and dyspareunia) being more common, future pregnancy being less common, and infertility being more frequent than after VB. For offspring, infant urinary tract infection after any CB, gastrointestinal tract infections and autism after planned CB were more common compared with VB. New findings highlight conflicting reports on childhood asthma and type 1 diabetes risk after planned CB, suggesting that prior positive associations may be explained by confounding. Existing evidence in NG192 suggests that cardiac arrest, maternal death and hysterectomy are more common after planned CB, but arise from studies at high risk of bias. NG192 also reports that placenta accreta and uterine rupture in a future pregnancy are more common after any CB. No new evidence was identified on these outcomes. Conclusion This review provides stakeholder-relevant information to populate decision-support materials on outcomes of planned (and actual) CB compared with planned (and actual) VB. The existing evidence base lacks data on long-term outcomes of planned (rather than actual) VB.

17
Comparing Pathway-Informed Polygenic Risk Score Strategies: A multi-cohort evaluation of Amyloid-β

Zhang, X.; Goudey, B.; Laws, S.; Masters, C.; Baldwin, T.; Faux, N.

2026-05-27 health informatics 10.64898/2026.05.25.26354071 medRxiv
Top 5%
0.1%
Show abstract

Objective: To systematically evaluate pathway-informed polygenic risk score (PRS) strategies and determine which approaches most effectively leverage biological annotations for risk prediction, using brain amyloid-beta positivity as a case study. Methods: We systematically benchmarked approaches for integrating pathway information into PRS construction to predict brain A{beta} positivity. Using two cohorts, the Alzheimer's Disease Neuroimaging Initiative (ADNI, n = 969) and Australian Imaging, Biomarkers and Lifestyle (AIBL, n = 251), we compared Apolipoprotein E (APOE) genetic risk score (GRS), clumping and thresholding (C+T) PRS, pathway-guided single nucleotide polymorphism (SNP) selection PRS, and pathway-specific PRSs ensembled via machine learning. Pathways were derived from manually curated literature or from pathway databases via Functional Mapping and Annotation (FUMA). Results: In cross-validation on the ADNI cohort, pathway-informed PRS using a narrow-set of pathways to guide SNP selection (PathPRS-SNPLit without APOE locus) significantly outperformed the standard PRS model (median AUC = 0.742, p = 0.006) and the APOE locus model (median AUC = 0.736, p = 5.1 x 10-5) based on the Mann-Whitney U test, achieving a median AUC of 0.763. This model showed enhanced ability to identify subgroups within the 10% lowest- and highest-risk groups compared to the current standard of APOE locus alone (odds ratio = 0.67, 95% CI: 0.56-0.81; and OR = 13.23, 95% CI: 10.23-17.11), highlighting its clinical potential. Using a focused set of literature-curated pathways outperformed using a broader set of database-derived pathways across configurations. When contrasting strategies for aggregating information across pathways, we observed that using pathways to guide selection of SNPs and then building a single PRS performed comparably to building PRS for each pathway and using machine learning (ML) to aggregate these, though the latter enabled pathway-level interpretability. Similar trends were observed in the external AIBL validation dataset. Interpretation: Pathway-informed PRS can meaningfully improve genetic risk enrichment for A{beta} positivity beyond APOE and standard C+T approaches, provided pathway definitions are carefully curated. The choice of pathway source has the strongest impact on predictive performance, with aggregation strategies or ML model choice having far less impact. Our findings highlight the utility of literature-curated, pathway-informed PRSs for A{beta} prediction and offer practical guidance for pathway-informed PRS construction in other polygenic traits.

18
Multivariate determinants of wearable-measured sleep quality across a large observational cohort: roles of physical activity, gut microbiome, blood analytes, and lifestyle factors.

Cavon, J.; Perez, C.; Quinn-Bohmann, N.; Magis, A. T.; Gibbons, S. M.

2026-05-29 health informatics 10.64898/2026.05.27.26354250 medRxiv
Top 6%
0.1%
Show abstract

Emerging evidence links the gut microbiome to sleep quality, yet measuring sleep at scale remains challenging. Commercial wearables, such as Fitbit, capture objective sleep and activity data in naturalistic settings. We integrated Fitbit data from a large, deeply-phenotyped cohort with paired lifestyle and health questionnaires. Wearable-derived measures aligned well with self-reported sleep, activity, and happiness. We identified dozens of covariate-adjusted associations between Fitbit-derived sleep features, lifestyle factors, and multi-omic data. Among molecular feature sets, the gut microbiome showed the greatest number of associations with sleep quality: butyrate-producing genera were positively associated with sleep and amplified the benefits of physical activity. Oscillospira, in particular, was consistently associated with better sleep. In blood, insulin, omega-3, and cortisol correlated with poorer sleep, whereas lower alcohol intake and mineral supplements correlated with better sleep. These robust, covariate-adjusted findings advance mechanistic understanding of the gut-sleep axis and broader molecular and lifestyle determinants of sleep quality.

19
An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv
Top 6%
0.1%
Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [&le;] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [&le;] 40%. After fine-tuning on less than 10% of external data, LVEF [&le;] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

20
Estimating Lifetime Periodontal Burden Under Informative Tooth Loss

McCormick, K. M.; Amarasena, N.; Guzzo, G.

2026-05-30 dentistry and oral medicine 10.64898/2026.05.27.26354300 medRxiv
Top 6%
0.1%
Show abstract

Background: Periodontitis is defined by cumulative, irreversible tissue destruction, yet population-based measurement typically relies on cross-sectional indicators derived from retained teeth. Destruction that occurred earlier in life, particularly disease severe enough to result in tooth loss, is structurally excluded from these measures, potentially leading to systematic underestimation of lifetime periodontal burden. Objective: To develop and evaluate a measurement framework that estimates lifetime periodontal burden from cross-sectional data by explicitly incorporating informative tooth loss under etiological uncertainty. Methods: Data were drawn from 10,324 adults aged [&ge;]30 years participating in the 20090-2016 National Health and Nutrition Examination Survey (NHANES) who completed full-mouth periodontal examination and glycated hemoglobin (HbA1c) testing. Lifetime periodontal burden was estimated by combining observed clinical attachment loss in retained teeth with probabilistic contributions from missing teeth, using three alternative age-stratified attribution schedules derived from epidemiological studies of periodontal extraction. Performance was compared with conventional measures of periodontal severity and extent using distributional analyses, correlations with HbA1c, discrimination of diabetes status, and relative importance analysis. Age-adjusted models were treated as sensitivity analyses. Results: Estimated lifetime periodontal burden exhibited strong, monotonic age gradients across glycemic categories, in contrast to more attenuated patterns observed for severity and extent. Across attribution schedules, lifetime burden showed stronger correlations with HbA1c ({rho} = 0.30-0.32) than conventional measures. In multivariable models including all indices, lifetime burden retained an independent association with HbA1c, whereas severity and extent contributed little unique information. Discriminative performance for diabetes status was consistently higher for lifetime burden than for conventional measures and remained stable across attribution schedules. Conclusions: Lifetime periodontal burden can be estimated from cross-sectional data by explicitly modelling informative tooth loss rather than restricting measurement to retained teeth. Incorporating historical tissue loss under uncertainty yields a more coherent representation of cumulative periodontal destruction than snapshot-based measures and provides a methodological basis for life-course-oriented periodontal epidemiology.