Back

GENETICS

Oxford University Press (OUP)

Preprints posted in the last 7 days, ranked by how well they match GENETICS's content profile, based on 189 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Deriving LD-adjusted GWAS summary statistics through linkage disequilibrium deconvolution

Nouira, A.; Favre Moiron, M.; Tournaire, M.; Verbanck, M.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350574 medRxiv
Top 0.6%
1.7%
Show abstract

Genome-wide association studies (GWAS) have identified numerous genetic variants associated with complex traits. However, linkage disequilibrium (LD) confounds these associations, leading to false positives where non-causal variants appear associated because they are correlated with nearby causal variants. This is particularly the case in highly polygenic traits where the genome can be saturated in causal variants. To address this issue, we propose LDeconv a method based on truncated singular value decomposition (SVD) that adjust GWAS summary statistics without requiring individual-level genotype data. This approach accounts for LD structure, isolates causal variants in high-LD regions, and improve the reliability of effect size estimates. We assess its performance through simulations across various LD scenarios, conduct extensive sensitivity analyses, and apply them to real GWAS data from the UK Biobank. Our results demonstrate that LDeconv effectively reduces false discoveries while preserving true associations, offering a robust framework for post-GWAS analysis.

2
SPLIT: Safety Prioritization for Long COVID Drug Repurposing via a Causal Integrated Targeting Framework

Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.

2026-04-16 health informatics 10.64898/2026.04.12.26350701 medRxiv
Top 2%
0.7%
Show abstract

Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.

3
LLM-Driven Target Trial Emulation with Human-in-the-Loop Validation for Randomized Trial: Automated Protocol Extraction and Real-World Outcome Evaluation{Psi}

Dey, S. K.; Qureshi, A. I.; Shyu, C.-R.

2026-04-13 health informatics 10.64898/2026.04.09.26350523 medRxiv
Top 3%
0.3%
Show abstract

Target trial emulation (TTE) enables causal inference from observational data but remains bottlenecked by manual, expert-dependent protocol operationalization. While large language models (LLMs) have advanced clinical knowledge extraction and code generation, their ability to automate end-to-end TTE workflows remains largely unexplored. We present an LLM-driven framework using retrieval-augmented generation to extract the five core TTE design parameters from the Carotid Revascularization and Medical Management for Asymptomatic Carotid Stenosis Trial (CREST-2) protocol and generate executable phenotyping pipelines for real-world EHR data. The performance of the framework was evaluated along two dimensions. First, protocol extraction accuracy was assessed against a gold-standard checklist of trial design components using precision, recall, and F1-score metrics. Second, outcome validity was evaluated through population-level concordance analyses comparing EHR-derived outcomes with published trial endpoints using standardized mean difference, observed-to-expected ratios, confidence interval overlap, and two-proportion z-tests. Further, Human-in-the-loop validation assessed the correctness of extracted clinical logic and phenotype definitions. Together, these evaluations demonstrate a structured approach for assessing LLM-driven protocol-to-pipeline translation for scalable real-world evidence generation.

4
Inherited genetic risk factors in young-onset lung cancer

Esai Selvan, M.; Gould Rothberg, B. E.; Patel, A. A.; Sang, J.; Horowitz, A.; Christiani, D. C.; Klein, R. J.; Gumus, Z. H.

2026-04-15 genetic and genomic medicine 10.64898/2026.04.14.26350822 medRxiv
Top 4%
0.2%
Show abstract

Introduction Lung cancer is rare before age 45, and its inherited genetic basis remains poorly defined. Methods We performed whole-genome sequencing in 171 predominantly young-onset lung cancer patients and integrated these data with whole-exome sequencing from six major lung cancer consortia, yielding 9,065 patients. After quality control, analyses focused on 6,545 individuals of European ancestry, the largest ancestral group. We compared the prevalence of rare pathogenic and likely pathogenic (P/LP) germline variants between 186 young-onset (age <45 years) and 6,359 older patients at gene and gene-set levels using Fisher's exact test, stratified by histology, sex, and smoking status. Polygenic risk scores (PRS) derived from common variants were also evaluated. Results Young-onset patients carried a higher burden of rare germline P/LP variants in DNA damage response (DDR) genes (including BRIP1, ERCC6, MSH5), and in cilia-related genes, notably GPR161. At the pathway level, DDR genes were significantly enriched (OR=1.66, p=0.007), with the strongest signal in the Fanconi Anemia pathway and among females (OR=1.96, p=0.01). Enrichment was also observed in inborn errors of immunity pathways, with strongest signals in antibody deficiency and the complement system genes. Young-onset patients additionally exhibited higher lung cancer PRS. Conclusion Young-onset lung cancer exhibits a distinct germline genetic architecture, characterized by enrichment of rare P/LP variants in DDR, cilia-related, and immune pathways, and an elevated lung cancer PRS. These findings support a greater role for inherited susceptibility in early-onset disease and have implications for risk stratification, earlier screening, and precision prevention.

5
Independent Genetic Effects of Glucagon-like Peptide-1 Receptor Locus on Body Mass Index and Type 2 Diabetes

Liu, C.; Hui, Q.; Linchangco, G. V.; Dabbs-Brown, A.; Zhou, J. J.; Joseph, J.; Reaven, P. D.; Rhee, M. K.; Djousse, L.; Cho, K.; Gaziano, J. M.; Wilson, P. W.; Phillips, L. S.; The VA Million Veteran Program, ; Sun, Y. V.

2026-04-13 genetic and genomic medicine 10.64898/2026.04.10.26350615 medRxiv
Top 4%
0.2%
Show abstract

Background: The glucagon-like peptide-1 receptor (GLP1R) is a key regulator of glucose metabolism and appetite and a major therapeutic target for type 2 diabetes (T2D) and obesity. Genetic studies have implicated the GLP1R locus in both body mass index (BMI) and T2D, but it remains unclear whether their underlying genetic associations are the same. Methods: We analyzed 431,107 participants of genetically inferred European ancestry from the Million Veteran Program. Within 500 kb of GLP1R, we performed locus-wide linear regression models for BMI and logistic regression models for T2D, adjusted for age, sex, and 10 principal components. We identified primary and secondary BMI sentinel variants using conditional analyses and evaluated their associations with T2D. Bayesian fine-mapping was used to construct credible sets of GLP1R locus for BMI and T2D. Results: Conditioning on the primary sentinel variant rs12213929 (upstream of GLP1R, {beta} = 0.11; 95% CI 0.09-0.14; p = 1.94E-17), we identified a secondary variant (rs13216992, intron of GLP1R) independently associated with BMI ({beta} = 0.10; 95% CI 0.07-0.13; p = 7.88E-14). The two sentinel variants showed low linkage disequilibrium (r2 = 0.03). A two-variant allelic burden score (0-4; sum of the rs12213929 G-allele count and rs13216992 C-allele count) showed that participants with 4 risk alleles had 0.47 kg/m2 higher BMI than those with 0 risk alleles (95% CI 0.39-0.55; p < 2E-16). Both variants were associated with higher T2D risk, but with distinct patterns after BMI adjustment: the rs12213929-T2D association persisted after adjustment for BMI (OR = 1.02; 95% CI 1.01-1.03; p = 0.0004), whereas the rs13216992-T2D association was fully attenuated (OR = 1.00; 95% CI 0.99-1.01; p = 0.68). Fine-mapping identified a compact 95% BMI credible set of 17 variants and a broader 95% T2D credible set of 42 variants, with all BMI credible variants contained within the T2D set. Conclusions: The GLP1R locus harbors at least two independent BMI-associated variants that exhibit heterogeneous relationships with T2D: rs12213929 influences T2D risk partly through BMI-independent pathways, whereas rs13216992 appears to act predominantly via adiposity. These findings refine the genetic architecture at this key therapeutic target gene and provide a foundation for functional and pharmacogenomic studies to determine whether GLP1R variation can inform precision prevention and treatment of obesity and T2D.

6
Loss of MITF activity leads to emergent cell states from the melanocyte stem cell lineage

Brombin, A.; MacMaster, S.; Travnickova, J.; Wyatt, C.; Brunsdon, H.; Ramsey, E.; Vu, H. N.; Steingrimsson, E.; Kenny, C.; Chandra, T.; Patton, E. E.

2026-04-12 developmental biology 10.64898/2025.12.23.695681 medRxiv
Top 4%
0.2%
Show abstract

How embryonic cells generate large clones of cells in the adult represents a fundamental question in biology. Here, using melanocyte stem cells (McSCs) in the zebrafish as a model, we explore the function of the master melanocyte transcription factor (MITF) in safeguarding McSCs in embryonic development and their potential to pigment large clones in the adult. MITF is well known is for its role in the specification of melanoblasts from the neural crest (NC) and their differentiation into melanocytes, yet little is known about how this activity shapes the stem cell lineages. Here, we use live imaging coupled with single-cell transcriptomics and lineage tracing to show that MITF (mitfa in zebrafish) protects the melanocyte stem cell (McSC) fate in zebrafish. Utilizing a temperature sensitive mitfavc7 mutant, we show loss of Mitfa leads to a surprising premature and aberrant expansion of McSC progeny at the niche during embryogenesis, coupled with novel emergent transcriptional cell states. Linage tracing of McSCs from the embryonic to juvenile stages reveals Mitfa activity is subsequently required in regeneration by Schwann cell-like and melanocyte stem cell progenitors that serve as a reservoir for fast-responding pigment progenitors. Thus, the impact of Mitfa loss on the melanocyte lineage is cell-state and stage-specific. The emergent cell states upon mitfa loss may have important implications for our understanding the loss of MITF activity in human genetic disease and melanoma.

7
Cohort Profile: Investigation into Biomarkers to Predict Preterm Birth (INSIGHT) -- a Prospective Pregnancy Cohort Focused on Preterm Birth in the United Kingdom

Jackson, R.; Valensin, C.; Chin-Smith, E.; Suff, N.; Shennan, A. H.; Hezelgrave, N. L.; Tribe, R. M.

2026-04-11 obstetrics and gynecology 10.64898/2026.04.08.26350031 medRxiv
Top 4%
0.2%
Show abstract

1. PurposeSpontaneous preterm birth (sPTB), particularly early preterm birth and mid-trimester loss, remains poorly understood and difficult to predict. The INSIGHT cohort was established to create a deeply phenotyped, longitudinal pregnancy dataset integrating clinical data and biological sampling to investigate the mechanisms of cervical shortening and sPTB, with a focus on linking innate immune responses, the vaginal microbiome, and host biology to identify early biomarkers of risk. 2. Participants2272 pregnant women (8+0 -28+0 weeks gestation) were enrolled as high or low risk of preterm birth based on obstetric history, cervical length, cervical procedures, multiple pregnancy, or Mullerian anomalies. Serial clinical data and biological samples, including cervicovaginal specimens and blood, were collected throughout pregnancy. 3. Findings to dateThe cohort has generated comprehensive multi-omic data, including transcriptomic, microbiome, metabolomic, proteomic, and immune profiling. Key findings demonstrate that maternal plasma cfRNA can predict early sPTB months before clinical presentation, and that integration of cervicovaginal microbiota, metabolites, and host immune markers improves risk prediction and provides mechanistic insight into inflammatory pathways leading to sPTB. 4. Future plansRecruitment concluded in 2023, with final visits occurring in 2024. Ongoing analyses focus on refining predictive models, defining biological subtypes of preterm birth, and translating integrated biomarker panels into clinically scalable risk stratification tools. STRENGTHS AND LIMITATIONS OF THIS STUDYO_LILarge, prospective longitudinal cohort (Strength): Ten years of recruitment with repeat sampling enabled detailed study of biological pathways leading to sPTB. C_LIO_LIBroad risk spectrum with clear definitions (Strength): Inclusion of both high and low-risk women using pre-specified clinical criteria supported robust comparative analyses and biomarker discovery. C_LIO_LIMulticentre NHS recruitment (Strength): Inclusion of several sites, particularly the diverse Lambeth population at St Thomas, enhanced population diversity and external validity. C_LIO_LIHospital-based, high-risk enrichment (Limitation/Strength): Recruitment from specialist preterm birth clinics and secondary/tertiary care may limit generalisability to lower-risk or primary care populations. However, it did ensure many preterm birth events were captured prospectively in this study. C_LIO_LIIncomplete follow-up and limited late sampling (Limitation): Attrition and sampling only up to a prespecified gestation (defined by standard clinical pathway) reduced full pregnancy coverage of longitudinal data. C_LI

8
Medicalbench: Evaluating Large Language Models Towards Improved Medical Concept Extraction

Yang, Z.; Lyng, G. D.; Batra, S. S.; Tillman, R. E.

2026-04-16 health informatics 10.64898/2026.04.12.26350704 medRxiv
Top 4%
0.2%
Show abstract

Medical concept extraction from electronic health records underpins many downstream applications, yet remains challenging because medically meaningful concepts, such as diagnoses, are frequently implied rather than explicitly stated in medical narratives. Existing benchmarks with human-annotated evidence spans underscore the importance of grounding extracted concepts in medical text. However, they predominantly focus on explicitly stated concepts and provide limited coverage of cases in which medically relevant concepts must be inferred. We present MedicalBench, a new benchmark for medical concept extraction with evidence grounding that evaluates implicit medical reasoning. MedicalBench formulates medical concept extraction as a verification task over medical note concept pairs, coupled with sentence level evidence identification. Built from MIMIC-IV discharge summaries and human verified ICD-10 codes, the dataset is curated through a multi stage large language model (LLM) triage pipeline followed by medical annotation and expert review. It deliberately includes implicit positives, semantically confusable negatives, and cases where LLM judgments disagree with medical expert assessments. Annotators provide sentence level evidence spans and concise medical rationales. The final dataset contains 823 high quality examples. We define two complementary evaluation tasks: (1) medical concept extraction and (2) sentence level evidence retrieval, enabling assessment of both correctness and interpretability. Benchmarking state-of-the-art LLMs and a supervised baseline reveals that performance remains modest, highlighting the difficulty of extracting implicitly expressed concepts. We further show that explicitly incorporating reasoning cues and prompting to extract implicit evidence substantially improves medical concept extractions, while performance is largely invariant to note length, indicating that MedicalBench isolates reasoning difficulty rather than superficial confounders. MedicalBench provides the first systematic benchmark for implicit, evidence-grounded medical concept extraction, offering a foundation for developing medical language models that can both identify medically relevant concepts and justify their predictions in a transparent and medically faithful manner.

9
Vector2Variant: Discovery of Genetic Associations from ML Derived Representations without Phenotype Engineering

Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.

2026-04-17 genetic and genomic medicine 10.64898/2026.04.10.26350624 medRxiv
Top 5%
0.2%
Show abstract

Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.

10
Spatial Decomposition of Longitudinal RNFL Maps Reveals Distinct Modes of Glaucomatous Progression with Structure Function and Genetic Signatures

Chen, L.; Zhao, Y.; Moradi, M.; Eslami, M.; Wang, M.; Elze, T.; Zebardast, N.

2026-04-11 health informatics 10.64898/2026.04.09.26350387 medRxiv
Top 5%
0.2%
Show abstract

Purpose: To determine whether spatial decomposition of longitudinal retinal nerve fiber layer (RNFL) change maps reveals distinct modes of glaucomatous progression masked by conventional averaging, and to validate these modes through structure function mapping and genetic association analysis. Methods: Pixel wise RNFL rates of change were computed from longitudinal optic disc OCT scans of 15,242 eyes (8,419 adults with primary open angle glaucoma [POAG]; Massachusetts Eye and Ear, 1998 to 2023). A loss only constraint zeroed all thickening values, reflecting the biological prior that adult RNFL does not regenerate. Nonnegative matrix factorization decomposed these maps into spatial progression components (80% training set). Components were evaluated in a heldout set (20%) for retinotopic structure function concordance, visual field (VF) progressor classification against global and quadrant RNFL rates, and enrichment of genetic association signals at established POAG loci. Results: Six anatomically distinct progression patterns emerged, including diffuse circumferential loss, focal peripapillary defects, and arcuate bundle degeneration. Pattern based models significantly outperformed global RNFL rate for classifying VF progressors (area under the curve, 0.750 [95% CI, 0.709 to 0.790] vs. 0.702; P = .0096) and explained additional variance in functional decline (Nagelkerke pseudoR2, 0.301 vs. 0.198; P = .0011). Structure function mapping confirmed retinotopic coherence. Spatial phenotypes recovered stronger genetic signals than global rates at 85.3% of established POAG loci, suggesting they capture more biologically homogeneous endophenotypes of progression. Conclusions: Glaucomatous structural progression occurs through spatially distinct modes with independent structure function and genetic signatures that conventional RNFL averaging obscures.

11
Democratizing Scientific Publishing: A Local, Multi-Agent LLM Framework for Objective Manuscript Editing

Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.

2026-04-17 health informatics 10.64898/2026.04.13.26350761 medRxiv
Top 5%
0.1%
Show abstract

Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.

12
Leveraging State-of-the-Art LLMs for the De-identification of Sensitive Health Information in Clinical Speech

Dai, H.-J.; Mir, T. H.; Fang, L.-C.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.

2026-04-17 health informatics 10.64898/2026.04.13.26349911 medRxiv
Top 6%
0.1%
Show abstract

Accurate recognition and deidentification of sensitive health information (SHI) in spoken dialogues requires multimodal algorithms that can understand medical language and contextual nuance. However, the recognition and deidentification risks expose sensitive health information (SHI). Additionally, the variability and complexity of medical terminology, along with the inherent biases in medical datasets, further complicate this task. This study introduces the SREDH/AI-Cup 2025 Medical Speech Sensitive Information Recognition Challenge, which focuses on two tasks: Task-1: Speech transcription systems must accurately transcribe speech into text; and Task-2: Medical speech de-identification to detect and appropriately classify mentions of SHI. The competition attracted 246 teams; top-performing systems achieved a mixed error rate (MER) of 0.1147 and a macro F1-score of 0.7103, with average MER and macro F1-score of 0.3539 and 0.2696, respectively. Results were presented at the IW-DMRN workshop in 2025. Notably, the results reveal that LLMs were prevalent across both tasks: 97.5% of teams adopted LLMs for Task 1 and 100% for Task 2. Highlighting their growing role in healthcare. Furthermore, we finetuned six models, demonstrating strong precision ([~]0.885-0.889) with slightly lower recall ([~]0.830-0.847), resulting in F1-scores of 0.857-0.867.

13
Adherence to International Pharmacogenomic Recommendations in Paediatric Cancer Care: A Cohort Analysis Embedded Within the MARVEL-PIC Randomised Trial

Chawla, A.; Carter, S.; Dyas, R.; Williams, E.; Moore, C.; Conyers, R.

2026-04-16 genetic and genomic medicine 10.64898/2026.04.15.26348678 medRxiv
Top 6%
0.1%
Show abstract

Background: Pharmacogenomic testing (PGx) can optimise drug efficacy and minimise toxicity, but the extent of prescriber adherence to PGx recommendations remains unclear. We aimed to quantify clinician adherence to international genotype-guided prescribing recommendations in a cohort of paediatric oncology patients. Methods: We reviewed files of children enrolled in the MARVEL-PIC (NCT05667766) randomised control trial, who had PGx recommendations available. Patients were included if 12 weeks had passed since their PGx report was released to clinicians. Prescribing events were identified for actionable PGx recommendations, and classified as "explicitly followed", "inadvertently followed", or "not followed". Adherence was assessed by patient, drug, and recommendation. Results: 2,063 PGx recommendations were available for 216 patients. 64 (3.1%) recommendations were actionable for 44 patients and 10 drugs within the 12-week study period. Recommendations were explicitly followed in 57/288 (19.8%) of prescribing events, inadvertently followed in 145 (50.3%), and not followed in 86 (29.9%). Mercaptopurine demonstrated the highest rate of explicit adherence (87.5%). No significant associations were observed between adherence and age group, cancer type, drug type, or strength of recommendation. Conclusion: Adherence to pharmacogenomic recommendations was very low, highlighting the need to understand barriers to PGx implementation, and consideration of clinical decision supports to facilitate adherence.

14
Preterm delivery and placental pathology with clinical and pathogenic implications

Zhang, P.

2026-04-13 obstetrics and gynecology 10.64898/2026.04.09.26350526 medRxiv
Top 6%
0.1%
Show abstract

BackgroundPreterm birth is one of the most significant etiologies for neonatal morbidity and mortality. Preterm delivery is classified as iatrogenic preterm delivery and spontaneous preterm delivery. The role of placental pathology is studied. Materials and methodsWe have previously collected placental pathology data with maternal pregnancy and neonatal birth data, and we investigated the role of placental pathology in preterm delivery. Preterm delivery was categorized as late preterm (34-36 weeks), moderate preterm (32 to 33 weeks), and extreme preterm (less than 32 weeks). Neonatal, maternal, placental gross and histologic features, and laboratory parameters were compared across groups using chi-square tests for categorical variables and Kruskal-Wallis tests for continuous variables using various programs in R-package. ResultsTotally 3723 singleton placentas including 3307 term (88.8%) and 416 preterm placentas (11.2%) were examined with maternal pregnancy data and neonatal birth data. There were 614 placentas from patients with preeclampsia/pregnancy induced hypertension (PRE/PIH) (16.5%). Preterm delivery showed significantly lower fetal birth weight, placental weight, and fetal-placental ratio (all p<0.01). Maternal Black race was more prevalent in preterm groups (up to 50.8% in extreme preterm vs. 33.2% in term, p<0.01). Preterm delivery was statistically associated with PRE/PIH and maternal vascular malperfusion (MVM), maternal and fetal inflammatory response (MIR and FIR), and increased pre-delivery white blood count (WBC). Extreme preterm deliveries were markedly associated with intrauterine fetal death (27.5%, p<0.01) and MIR/FIR (56.7%, p<0.01). After excluding PRE/PIH patients, preterm delivery was statistically associated with MIR/FIR and increased WBC. ConclusionsDistinct clinicopathologic profiles exist across preterm subcategories, with MVM predominating in late/moderate preterm and severe pathologic features (including fetal demise and acute inflammation) in extreme preterm. These findings highlight heterogeneous etiologies of preterm delivery.

15
Individualised evoked response detection based on the spectral noise colour

Undurraga Lucero, J. A.; Chesnaye, M.; Simpson, D.; Laugesen, S.

2026-04-13 health informatics 10.64898/2026.04.11.26350685 medRxiv
Top 7%
0.1%
Show abstract

Objective detection of evoked potentials (EPs) is central to digital diagnostics in hearing assessment and clinical neurophysiology, yet current approaches remain time-intensive and sensitive to inter-individual noise variability. Many existing detection methods rely on population-based assumptions or computationally demanding procedures, limiting robustness and efficiency in real-world clinical settings. We present Fmpi, a digital EP detection framework enabling individualised, real-time response detection through analytical modelling of the spectral colour and temporal dynamics of background noise within each recording. Using extensive simulations and large-scale human electroencephalography datasets spanning brainstem, steady-state, and cortical EPs recorded in adults and infants, we demonstrate performance comparable or superior to state-of-the-art bootstrapped methods while operating at a fraction of the computational cost and maintaining well-controlled sensitivity with improved specificity. Importantly, Fmpi incorporates a futility detection mechanism enabling early termination of uninformative recordings, reducing testing time without compromising diagnostic reliability.

16
Colibactin-associated mutations in the human colon appear to reflect anatomy and early exposure, not oncogenesis

Hiatt, L.; Peterson, E. V.; Happ, H. C.; Major-Mincer, J.; Avvaru, A.; Goclowski, C. L.; Garretson, A.; Sasani, T. A.; Hotaling, J. M.; Neklason, D. W.; Uchida, A. M.; Quinlan, A. R.

2026-04-15 genetic and genomic medicine 10.64898/2026.04.13.26350783 medRxiv
Top 7%
0.1%
Show abstract

Colorectal cancer (CRC) is the second leading cause of cancer death globally and the number one cause of cancer death in people under 50 years old. The reasons for the rise of early-onset CRC are unknown, and while anatomically distinct subtypes of CRC have substantial clinical and molecular associations, the etiology of region-specific disease, such as early-onset CRC's enrichment in the distal colon, remains unclear. Understanding regional mutagenesis may identify risk factors for this public health concern and CRC more broadly. To evaluate mutational dynamics across the premalignant colon, we performed whole-genome sequencing of 125 individual colon crypts taken from six standardized regions biopsied during colonoscopy, collected from 11 donors without polyps and 10 with polyps. We observed mutation spectra and accumulation rates consistent with previous whole-organ studies, with greater subclonal mutation capture enabled by experimental design. T>[A,C,G] mutations, which are associated with colibactin genotoxicity from pks+ Escherichia coli, were significantly enriched in the rectum of donors with and without polyps (adjusted p-values < 0.01). Moreover, when comparing findings to crypts from individuals with CRC and sequenced CRC tumors, we observed consistent enrichment of the colibactin-associated mutational signature "ID18" in the rectum in both normal colon crypts and CRC tumors, without significant difference in colibactin-specific single nucleotide variant or insertion-deletion burden in crypts across the three clinical groups (i.e., no polyp, polyp, and CRC). These findings argue against a causal or prognostic role for colibactin in CRC, instead indicating that the proposed association with early-onset disease reflects anatomic specificity rather than cancer-specific clinical relevance.

17
Wearable sleep staging using photoplethysmography and accelerometry across sleep apnea severity: a focus on very severe sleep apnea

Ogaki, S.; Kaneda, M.; Nohara, T.; Fujita, S.; Osako, N.; Yagi, T.; Tomita, Y.; Ogata, T.

2026-04-13 health informatics 10.64898/2026.04.09.26350266 medRxiv
Top 7%
0.1%
Show abstract

Study ObjectivesTo evaluate wearable sleep staging across sleep apnea severity, including very severe sleep apnea defined as an apnea-hypopnea index (AHI)[&ge;] 50 events/h, and to assess how training-set composition affects performance in this subgroup. MethodsWe analyzed 552 overnight recordings, 318 from the Sleep Lab Dataset and 234 from the Hospital Dataset. In the Hospital Dataset, 26.5% had very severe sleep apnea. We developed a deep learning model for sleep staging using RR intervals from wrist-worn photoplethysmography and three-axis accelerometry. Baseline performance was assessed by cross-validation under 5-stage and 4-stage staging. We examined night-level associations with AHI severity. We also compared the baseline model with an ablation model trained on the same number of recordings but with more Sleep Lab Dataset and lower-AHI Hospital Dataset recordings, evaluating both models in the very severe subgroup. ResultsIn 5-stage classification, Cohens kappa was 0.586 in the Sleep Lab Dataset and 0.446 in the Hospital Dataset. Under 4-stage staging, the gap narrowed, with kappa values of 0.632 and 0.525, respectively. In the Hospital Dataset, performance declined with increasing AHI severity. Among 62 recordings with very severe sleep apnea, reducing high-AHI representation in training lowered kappa from 0.365 to 0.303. ConclusionsWearable sleep staging performance declined across greater sleep apnea severity in this clinical cohort. Clinical utility may benefit from training data that better represent the target severity spectrum and from selecting staging granularity to match the intended use case. Statement of SignificanceRepeated laboratory polysomnography is impractical for long-term sleep apnea management. Wearable sleep staging could support scalable monitoring, yet its reliability in clinically severe sleep apnea has remained unclear. This study developed and evaluated a wearable sleep staging approach in both sleep-laboratory and hospital cohorts. The hospital cohort included many severe and very severe cases. Performance was lower in the hospital cohort and declined with greater sleep apnea severity. A coarser staging scheme reduced the gap between cohorts, and models trained without representative very severe cases performed worse in this target population. These findings highlight the value of severity-aware model development and motivate future multi-night home validation with reliability cues.

18
Mutation timing, accumulation and selection in the male germline shape inheritance risk for developmental disorders

Neville, M. D. C.; Neuser, S.; Sanghvi, R.; Christopher, J.; Roberts, K.; Smith, K.; ONeill, L.; Hayes, J.; Cagan, A.; Hurles, M. E.; Goriely, A.; Abou Jamra, R.; Rahbari, R.

2026-04-13 genetic and genomic medicine 10.64898/2026.04.09.26350474 medRxiv
Top 8%
0.1%
Show abstract

De novo mutations (DNMs) arising in the parental germline are a major cause of severe developmental disorders. While most DNMs originate in the paternal germline, it remains unclear whether fathers of affected children carry a systematically altered burden of transmissible germline risk, or whether disease largely reflects stochastic outcomes of shared population-wide mutational processes. Here, we combined whole-genome sequencing of 168 parent-child trios with ultra-accurate duplex sequencing of paternal sperm to directly relate transmitted DNMs to the broader mutational and selective landscape of the male germline. In 127 fathers, sperm mutation burden and mutational spectra were indistinguishable from population reference cohorts. Positive selection metrics were likewise concordant, with a global dN/dS of 1.56 (95% CI 1.45-1.67) compared to 1.44 (95% CI 1.17-1.77) in controls and 28 of 32 significantly selected genes overlapping with prior findings. Six fathers harboured a pathogenic early mosaic variant detectable in sperm at allele fractions that ranged from 0.7% to 14.8%. Although these variants generated substantial individual-level risk outliers, they accounted for only [~]11% of the aggregated exome pathogenic burden across the cohort. The remaining burden was distributed across low-VAF mutations, including positively selected driver variants and other rare mutations accumulating with paternal age. Together, these results show that transmissible de novo disease risk is governed primarily by universal germline mutational and selective processes, while early developmental mosaicism produces uncommon but clinically meaningful deviations. This integrated view clarifies how mutation timing, age-associated accumulation and germline selection jointly shape inheritance risk.

19
JARVIS, should this study be selected for full-text screening? Performance of a Joint AI-ReViewer Interactive Screening tool for systematic reviews

Barreto, G. H. C.; Burke, C.; Davies, P.; Halicka, M.; Paterson, C.; Swinton, P.; Saunders, B.; Higgins, J. P. T.

2026-04-11 health informatics 10.64898/2026.04.08.26350384 medRxiv
Top 9%
0.0%
Show abstract

BackgroundSystematic reviews are essential for evidence-based decision making in health sciences but require substantial time and resource for manual processes, particularly title and abstract screening. Recent advances in machine learning and large language models (LLMs) have demonstrated promise in accelerating screening with high recall but are often limited by modest gains in efficiency, mostly due to the absence of a generalisable stopping criterion. Here, we introduce and report preliminary findings on the performance of a novel semi-automated active learning system, JARVIS, that integrates LLM-based reasoning using the PICOS framework, neural networks-based classification, and human decision-making to facilitate abstract screening. MethodsDatasets containing author-made inclusion and exclusion decisions from six published systematic reviews were used to pilot the semi-automated screening system. Model performance was evaluated across recall, specificity and area under the curve precision-recall (AUC-PR), using full-text inclusion as the ground truth. Estimated workload and financial savings were calculated by comparing total screening time and reviewer costs across manual and semi-automated scenarios. ResultsAcross the six review datasets, recall ranged between 98.2% and 100%, and specificity ranged between 97.9% and 99.2% at the defined stopping point. Across iterations, AUC-PR values ranged between 83.8% and 100%. Compared with human-only screening, JARVIS delivered workload savings between 71.0% and 93.6%. When a single reviewer read the excluded records, workload savings ranged between 35.6 % and 46.8%. ConclusionThe proposed semi-automated system substantially reduced reviewer workload while maintaining high recall, improving on previously reported approaches. Further validation in larger and more varied reviews, as well as prospective testing, is warranted.

20
Designing national programs for expanded carrier screening: Results from a discrete-choice experiment in Singapore

Blythe, R.; Senanayake, S.; Bylstra, Y.; Roberts, J.; Choi, C.; Yeo, M. J.; Goh, J.; Graves, N.; Koh, A. L.; Jamuar, S. S.

2026-04-13 health economics 10.64898/2026.04.09.26350563 medRxiv
Top 10%
0.0%
Show abstract

BackgroundCarrier screening for inherited genetic disorders can reduce the burden of conditions that lead to childhood morbidity and mortality, including thalassaemia, cystic fibrosis, and spinal muscular atrophy. To be successful, national carrier screening programs should aim to maximise uptake, which may depend on population preferences for screening characteristics. In this study, we aimed to determine how expanded carrier screening in Singapore should be designed based on operational factors including suggested copayments, wait times, and disorders included in screening panels. MethodsWe elicited stated preferences for the design of a hypothetical national carrier screening program with seven attributes from 500 Singaporeans of reproductive age (18 to 54). A discrete choice experiment was applied using 30 choice tasks with 3 alternatives per task, divided between 3 blocks. The mixed multinomial logit model was used to estimate willingness-to-pay for each attribute level. Predicted uptake for three plausible screening programs was assessed, with copayment amounts from $0 to $1,200 in increments of $30. Impact on the annual national budget was calculated as a function of 25,000 expected eligible couples per year. All costs were reported in 2026 SGD. ResultsRespondents showed the strongest preferences for cost, followed by the number of diseases included in the panel, then wait times, with limited impact of remaining attributes. With no copayments, predicted uptake ranged from 85% [95% CI: 83% to 87%] to 90% [88% to 92%] for the basic and utility-maximising screening programs, respectively. This declined to 61% [56% to 66%] and 69% [65% to 73%] and, respectively, at a copayment of $1,200 per test. The model predicted higher uptake if a selection of screening alternatives were available, compared to a single program. The budget impact was highly dependent on population eligibility, copayments, and couples decision-making processes, but was unlikely to exceed $22.5m [$19.0m to $26.6m] per year unless expanded beyond married couples. ConclusionsThere was high predicted demand for carrier screening even as copayments increased. Successful strategies to improve uptake may include reducing copays and wait times, increasing the number of screening options available to prospective parents, and increasing program eligibility beyond pre-conception married couples.