Back

Med

Elsevier BV

Preprints posted in the last 90 days, ranked by how well they match Med's content profile, based on 38 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Targeted Long-Read sequencing provides functional validation of variants predicted to alter splicing

Quartesan, I.; Manini, A.; Parolin Schnekenberg, R.; Facchini, S.; Curro, R.; Ghia, A.; Bertini, A.; Polke, J.; Bugiardini, E.; Munot, P.; O'Driscoll, M.; Laura, M.; Sleigh, J. N.; Reilly, M. M.; Houlden, H.; Wood, N.; Cortese, A.

2026-03-06 neurology 10.64898/2026.03.02.26346984 medRxiv
Top 0.1%
14.9%
Show abstract

BackgroundWhole-genome sequencing (WGS) has improved the diagnosis of rare genetic disorders, yet interpretation of non-coding variants that affect splicing remains challenging. In silico predictions alone are insufficient, and short-read RNA sequencing may fail to capture complex or low-abundance splicing events. Targeted amplicon-based long-read RNA sequencing (Amp-LRS) offers a cost-effective approach for functional validation of candidate splice-altering variants. MethodsWe applied Amp-LRS to five patients with neurological disorders (central nervous system, peripheral nervous system, or muscle) harbouring candidate non-coding variants predicted to alter splicing. RNA was extracted from fibroblasts or peripheral blood, and full-length transcript amplicons were sequenced using Oxford Nanopore Technologies. Nonsense-mediated decay (NMD) inhibition was performed on fibroblast cultures using cycloheximide. ResultsAmp-LRS validated all five candidate variants, including intronic and UTR variants in POLR3A, OPA1, PYROXD1, GDAP1, and SPG11. Aberrant splicing events included exon skipping, intron retention, cryptic splice site activation, and pseudoexon inclusion, often resulting in frameshifts and premature termination codons. For POLR3A and OPA1, multiple abnormal isoforms arose from single variants, highlighting the complexity of splicing disruption. Some pathogenic effects were detectable only in a minority of reads and variably enriched by NMD inhibition, consistent with being hypomorphic. The approach was successfully applied using accessible tissues and enabled multiplexed sequencing at low per-sample cost. ConclusionsAmp-LRS is a sensitive, versatile, and cost-effective method for functional assessment of non-coding splice-altering variants identified by WGS. By enabling full-length transcript analysis from accessible tissues, this approach improves interpretation of variants of uncertain significance and could enhance molecular diagnosis in rare neurological diseases.

2
The FEES Dysphagia Index: a bias-resilient continuous score that captures expert clinical judgment in 2,943 neurological inpatients

Werner, C. J.; Sanchez-Garcia, E.; Mall, B.; Meyer, T.; Pinho, J.; Schulz, J. B.; Schumann-Werner, B.

2026-04-21 neurology 10.64898/2026.04.20.26351259 medRxiv
Top 0.1%
8.4%
Show abstract

Multi-consistency testing during flexible endoscopic evaluation of swallowing (FEES) is clinically necessary but introduces selection bias: worst scores inflate severity because the number of consistencies tested covaries with disease severity. In this retrospective observational study of hospitalized neurological patients, we derived and validated the FEES Dysphagia Index (FDI) in two temporally independent cohorts (Cohort 1: 2013-2018, N=1,257; Cohort 2: 2021-2025, N=1,686) from a single center. FDI-S averages Penetration-Aspiration Scale (PAS) scores across tested consistencies (0-100 scale); FDI-E uses Yale Pharyngeal Residue scores; FDI-C combines both. Selection bias was quantified using sequential branching-tree inverse probability weighting (IPW). Worst PAS overestimated severity by 24%; FDI deviated by <2%. FDI-C was significantly superior to Worst PAS for hospital-acquired pneumonia (HAP; AUC 0.70 vs. 0.60, p<0.001), mortality (0.71 vs. 0.62, p=0.040), and restricted oral intake (0.90 vs. 0.74, p<0.001), and statistically equivalent to clinician-rated severity. FDI-C mapped linearly onto ordinal Functional Oral Intake Scale values (FOIS; proportional odds RCS p=0.99). With functional status and diagnosis, FDI-C reconstructed the clinicians oral intake recommendation with AUC up to 0.93. The FDI-C-mortality relationship was sigmoidal with a clinically relevant transition zone between [~]50 and [~]85. FDI-C is a bias-resilient, bedside-calculable score with interval-scale properties that captures expert clinical judgment, suitable as both a clinical decision support tool and a continuous research endpoint.

3
Structured retrieval closes the gap between low-cost and frontier clinical language models

Gorenshtein, A.; Sorka, M.; Omar, M.; Miron, K.; Hatav, A.; Barash, Y.; Klang, E.; Shelly, S.

2026-03-24 neurology 10.64898/2026.03.22.26349018 medRxiv
Top 0.1%
7.4%
Show abstract

Most clinical large language model (LLM) benchmarks rely on clean, concise vignettes that do not reflect the noisy, long-form documentation typical of real clinical records. How LLM performance degrades under realistic chart conditions remains poorly characterised. Here we test whether structured retrieval workflows protect National Institutes of Health Stroke Scale (NIHSS) scoring accuracy under systematic context stress. Using 100 de-identified acute stroke cases and a fully crossed 4 x 4 x 3 x 3 condition matrix (144 conditions per case), we vary context acquisition method, document length, distractor load and critical-information position across four Gemini models (57,047 retained runs). Structured retrieval reduces mean absolute error (MAE) from 4.58 to 2.96 points relative to non-agentic baselines (mean gain 1.62 MAE points; 95% CI 1.57 to 1.67; 35% relative reduction), with consistent gains across all 36 stress combinations. Lower-cost models show disproportionately larger gains (2.76 versus 0.45 MAE points). Tool-retrieved pipelines outperform retrieval-augmented generation in 33 of 36 combinations. These findings indicate that retrieval architecture, rather than model scale alone, is a tractable lever for robust, equitable clinical LLM deployment.

4
TUCAN: Ultra-fast methylation-based classification of pediatric solid tumors and lymphomas

Jongmans, M.; van Tuil, M.; de Ruijter, E.; Hiemcke-Jiwa, L.; Flucke, U.; de Krijger, R.; Scheijde-Vermeulen, M.; Kusters, P.; van Ewijk, R.; Merks, H.; van Noesel, M.; Pages-Gallego, M.; Vermeulen, C.; Tops, B.; de Ridder, J.; Kester, L.

2026-03-26 oncology 10.64898/2026.03.24.26348466 medRxiv
Top 0.1%
6.8%
Show abstract

The high heterogeneity of pediatric cancers presents significant diagnostic challenges, underscoring the need for accurate classification. Although molecular profiling supports first-line diagnostics and guides treatment, it can delay final diagnosis. While Nanopore-based methylation analysis has enabled rapid CNS tumor diagnosis, its application to pediatric solid tumors and lymphomas has remained largely unexplored. We developed Tucan, a deep-learning classifier trained on 3,818 methylation array profiles representing 84 subtypes, designed to classify tumors from sparse Nanopore methylation data. In retrospective validation (n=514), Tucan generated confident predictions (CFT[&ge;] 0.7) within 30 minutes of sequencing in 385 cases, achieving 372 correct diagnoses (F1-score: 0.98). In prospective testing (n=74; 63 classifiable), 52 samples reached the confidence threshold with 96% accuracy, confirming the original diagnosis in 47 cases and correctly refining or revising it in three. Together, Tucan enables rapid, high-confidence molecular classification of pediatric solid tumors and lymphomas.

5
A VAE-based methodology for deep enterotyping and Parkinson's disease diagnosis

Qiao, Y.; Ma, Z.

2026-03-19 neurology 10.64898/2026.03.17.26348604 medRxiv
Top 0.1%
6.2%
Show abstract

Gut microbiome studies in Parkinsons disease (PD) are challenged by high dimensionality, sparsity, compositionality, and substantial between-cohort heterogeneity, all of which complicate robust community typing and disease-status classification. Here, we developed a variational autoencoder (VAE)-based methodology for deep enterotyping and PD diagnosis prediction (i.e., predicting diseased vs. control status) using a harmonized multi-cohort gut microbiome compendium comprising 1,957 16S rRNA samples from six PD case-control cohorts and an independent shotgun metagenomic validation cohort of 725 samples. Compared with conventional enterotyping approaches such as partitioning around medoids (PAM) and Dirichlet multinomial mixture (DMM) modelling, the VAE-derived latent space supported a clearer and more reproducible three-cluster solution. These three enterotype-like community states were biologically interpretable and were annotated as Enterococcus-type, Bacteroides-type, and Ruminococcus-type configurations. The same broad three-enterotype structure was independently recapitulated in the metagenomic dataset, supporting cross-platform robustness. Across the three inferred types, the proportion of PD samples was similar, and both the primary generalized linear mixed-effects model and sensitivity model showed that enterotype assignment was not a significant differentiating factor for PD status and that the lack of association was not dependent on a single modelling strategy. In the supervised branch, VAE-derived representations supported PD case-control classification while also providing a shared latent representation for clustering, enterotype transfer, and downstream interpretation. Collectively, these findings show that deep representation learning can improve the resolution, reproducibility, and interpretability of enterotype inference in heterogeneous microbiome datasets, and provide a practical methodology for organizing broad community structure in PD. In this setting, the main advantage of the VAE method lies in its ability to link unsupervised community typing with supervised prediction through a shared latent representation, even when broad community types do not function as stand-alone disease biomarkers.

6
The Gut-Vascular Axis in Intracranial Aneurysm Rupture: A Systematic Review and Meta-analysis of Human Microbiome Evidence

Fahim, F.; Hemmati, M.; Heshmaty, S.; Sharvirani, A.; Shahini, A.; Hosseini, A.; Hosseini Marvast, S. M.; Mojtahedzadeh, A.; Konarizadeh, M.; Dorisefat, F.; Maham, N.; Omranisarduiyeh, A.; Oveisi, S.; Fadaei Juibari, F.; Malekipour Kashan, B.; Sharifi, G.; Zali, A.

2026-04-07 neurology 10.64898/2026.04.05.26350207 medRxiv
Top 0.1%
5.0%
Show abstract

Background Intracranial aneurysm rupture is the leading cause of spontaneous subarachnoid hemorrhage and is associated with substantial mortality and long term neurological disability. Emerging evidence suggests that the gut microbiome may influence vascular inflammation and endothelial integrity through immune and metabolic pathways, yet human evidence linking gut microbial alterations to intracranial aneurysm remains fragmented and inconsistent. Objective This systematic review and meta analysis aimed to synthesize available human evidence on the association between gut microbiome alterations and intracranial aneurysm formation or rupture, with a primary focus on microbial dysbiosis and differences in gut microbial alpha diversity. Methods This study was conducted according to PRISMA 2020 guidelines and the protocol was prospectively registered in PROSPERO (CRD420261360785). A comprehensive search of PubMed, Scopus, Web of Science, Embase, and Cochrane CENTRAL was performed from database inception until April 1, 2026, with additional screening of grey literature sources. Observational human studies evaluating gut microbiome characteristics in patients with intracranial aneurysm were included. Mendelian randomization (MR) studies investigating genetically predicted microbial taxa and aneurysm outcomes were also reviewed. Random effects meta analysis using standardized mean differences (SMD) was performed for alpha diversity outcomes. MR taxa reported in at least two independent studies were quantitatively synthesized using inverse variance weighting of log odds ratios. Results The systematic search identified 396 records. After removal of duplicates and eligibility screening, 20 studies met inclusion criteria, including 12 observational clinical studies and 8 Mendelian randomization analyses. Meta analysis of three microbiome sequencing studies demonstrated significantly reduced gut microbial alpha diversity in patients with ruptured intracranial aneurysms compared with controls. Sensitivity analyses confirmed the robustness of pooled estimates. In addition, MR evidence identified several microbial taxa, including Ruminococcus1, Bilophila, Fusicatenibacter, and Porphyromonadaceae, as potentially protective factors against aneurysm related outcomes. Across observational studies, gut dysbiosis was frequently associated with inflammatory pathways and alterations in microbial metabolites implicated in vascular dysfunction. Conclusion Current human evidence suggests a potential association between gut microbiome dysbiosis and intracranial aneurysm pathophysiology, particularly in relation to aneurysm rupture. Reduced microbial diversity and specific microbial taxa may influence vascular inflammation and aneurysm wall stability. However, existing evidence remains limited and heterogeneous. Large prospective cohorts and mechanistic studies are required to clarify causal relationships and evaluate whether microbiome targeted interventions could contribute to aneurysm risk stratification or prevention strategies.

7
Scaling Multiplex qPCR Primer Design to 1000-plex using the Degenerate Incomplete Multiplex Primer List Extension (DIMPLE) Algorithm

Pinto, A.; Dong, X.; Wu, W.; Johnson, S. J.; Wen, Q.; Zhang, C.; Havey, J.; Wang, B.; Tang, G.; Farhat, A.; Zhang, D. Y.; Issa, G. C.; Zhang, X.

2026-04-21 bioengineering 10.64898/2026.04.17.719221 medRxiv
Top 0.1%
5.0%
Show abstract

Massively multiplexed qPCR is primarily constrained by increasing primer dimer formation as the number of distinct primers in a single reaction increases. Previous multiplex primer design algorithms either fail to sufficiently suppress primer dimers at 100+ plex, or take exceedingly high amounts of computational resources to complete. Here, we present DIMPLE, a linear-runtime primer design algorithm that effectively generates 10,000+ primers to amplify thousands of potential amplicons in a single qPCR reaction. As one clinical demonstration of this algorithm, we designed an assay to detect 2,302 distinct KMT2A gene fusion subtypes using 204 primers in a single tube. In contrast to FISH and convention NGS approaches with 2% variant allele frequency (VAF) limit of detection, our DIMPLE qPCR assay was able to analytically detect gene fusions down to 0.05% VAF. We also constructed proof-of-concept multiplex qPCR panels for additional oncology gene fusions, multiplex pathogen detection, and DNA methylation markers. The scalability and low computational cost DIMPLE are complementary to new instrument platforms for massively multiplex qPCR readout for enabling rapid, point-of-care nucleic acid testing.

8
WITHDRAWN: Causal Effects of Natural Language Processing-Enhanced Clinical Decision Support on Early Cognitive Impairment Detection: A Propensity Score Analysis Using Inverse Probability of Treatment Weighting

Dimitriou, A.; Foster, M.

2026-03-16 health informatics 10.64898/2026.02.10.26345968 medRxiv
Top 0.1%
4.9%
Show abstract

Withdrawal StatementThis article has been withdrawn by medRxiv because it was submitted with false information.

9
Cerebrospinal fluid metabolomic profiles associate with neurological recovery after shunt surgery in normal pressure hydrocephalus

Duan, L.; Tiemeyer, M. E.; Leary, O. P.; Hasbrouck, A.; Sayied, S.; Amaral-Nieves, N.; Meier, R.; Brook, J. R.; Kanarek, N.; Alushaini, S.; Guglielmo, M.; Svokos, K. A.; Klinge, P. M.; Fleischmann, A.; Ruocco, M. G.; Petrova, B.

2026-03-31 neurology 10.64898/2026.03.29.26349660 medRxiv
Top 0.1%
4.3%
Show abstract

Normal pressure hydrocephalus (NPH) is a potentially reversible neurological disorder characterized by urinary incontinence, gait impairment, and cognitive decline. However, postoperative improvement after shunt placement is variable, and reliable preoperative predictors are lacking, leaving patients exposed to uncertain surgical benefit and procedural risk. We therefore asked whether preoperative cerebrospinal fluid (CSF) metabolic profiles capture biological states associated with recovery potential. We analyzed ventricular CSF from patients undergoing shunt placement and identified metabolic patterns that differed between patients who improved postoperatively and those who did not. These signatures were detectable prior to intervention and were consistent across analytical approaches and patient cohorts. Multivariate models based on metabolite features were associated with postoperative improvement, with strongest performance observed for cognitive outcomes. Pathway-level analyses indicated coordinated alterations in processes related to redox balance, immune-metabolic signaling, and energy substrate utilization. These findings indicate that preoperative CSF metabolite profiles reflect biological states associated with recovery potential in NPH. The results further suggest that metabolic and immune-metabolic processes contribute to variability in surgical responsiveness and support the development of predictive biomarkers for patient stratification.

10
An LLM-assisted framework for accelerated and verifiable clinical hypothesis testing from electronic health records

Gim, N.; Gim, I.; Jiang, Y.; Kihara, Y.; Blazes, M.; Wu, Y.; Lee, C. S.; Lee, A. Y.

2026-02-12 health informatics 10.64898/2026.02.10.26346008 medRxiv
Top 0.1%
4.2%
Show abstract

Acquiring insights from electronic health records (EHRs) is slowed by manual analytical workflows that limit scalability and reproducibility. We present LATCH (LLM-Assisted Testing of Clinical Hypotheses), an agentic framework that converts natural language clinical hypotheses into fully auditable analyses on structured EHR data. LATCH integrates LLM-assisted semantic layers with deterministic execution pipelines to automate cohort construction, statistical analysis, and result reporting, while isolating patient-level data from LLM-involved steps. Using diabetes as a model disease, LATCH reproduced findings from 20 published studies within 3-15 minutes per study. Beyond replication, LATCH enabled study extensions and new insight generation through simple natural language hypothesis modifications. We demonstrated LATCH across 102 hypothesis tests spanning reproduction, extension, and insight generation. We systematically stress-tested LATCH to characterize its limitations and operational boundaries. LATCH provides a scalable framework for reproducible real-world evidence generation, reducing analytical bottlenecks and improving reliability of AI-assisted biomedical discovery while preserving human oversight.

11
C-RLM: Schema-Enforced Recursive Synthesis for Auditable, Long-Context Clinical Documentation

Yu, Y.

2026-01-26 health informatics 10.64898/2026.01.24.26344761 medRxiv
Top 0.1%
4.1%
Show abstract

Clinical decision-making for multi-morbid patients requires synthesizing evidence from lengthy, fragmented records--a task that exposes the limitations of standard Retrieval-Augmented Generation (RAG) and long-context Large Language Models (LLMs), which often lose critical information or lack auditability. We introduce the Clinical-Recursive Language Model (C-RLM), a framework that reframes evidence synthesis as a structured, recursive compilation process rather than a single-pass retrieval task. C-RLM iteratively builds a validated knowledge state using schema-enforced transitions, a Robust Nomenclature Resilience (RNR) layer for synonym consolidation, and a TraceTracker system for deterministic provenance. Evaluated on 100 complex Lupus Nephritis case reports ([~]24.5k tokens each), C-RLM achieves 100% structural consistency and 99% regimen recall (F1), outperforming a strong Flat RAG baseline. While introducing a 2.7x computational overhead, C-RLM delivers a crucial "Synthesis Dividend": recovery of clinically critical entities fragmented across distant text spans, with full auditability back to source text offsets. Our results demonstrate that for safety-critical clinical applications, the trade-off in latency is justified by gains in reliability, auditability, and support for human-in-the-loop governance.

12
Intracerebral hemorrhage induces monocyte TNF signaling that is suppressed by Siponimod (BAF312): a single-cell transcriptomics study in patients

DeLong, J. H.; Diaz-Perez, S.; Sheth, K. N.; Cha, J.-H.; Malanga, C.; Wagner, P. G.; Pezous, N.; Hanin, A.; Walsh, K. B.; Hinson, H. E.; Sansing, L. H.

2026-01-27 neurology 10.64898/2026.01.22.26344292 medRxiv
Top 0.1%
4.1%
Show abstract

Intracerebral hemorrhage (ICH) causes high morbidity and mortality, with neurotoxic inflammation driven by infiltrating monocytes. Therapeutic options remain limited. Here we performed single-cell RNA sequencing and plasma cytokine analysis on peripheral blood samples from ICH patients treated with the immunomodulatory drug BAF312 (Siponimod) or placebo at days 1, 3, and 7 post ICH. In the absence of treatment, the inflammatory response peaked at day 3 post ICH. BAF312 markedly reduced peripheral blood T and B lymphocyte numbers by day 3. BAF312 also impacted the myeloid response, suppressing TNF signaling in classical and non-classical monocytes. Multiple cytokine signaling pathways were decreased, though BAF312 did not impact plasma cytokine or chemokine concentrations. Notably, increased monocyte TNF signaling correlated with better functional outcome, possibly related to the positive role of monocytes during the subacute stage of ICH. These findings suggest that BAF312 suppresses peripheral immune responses after ICH and supports a complex role of monocytes in this disease.

13
PHARMWATCH: A Multilayer Pharmacogenomics Safety System for Accurate Star Allele Interpretation

Eisenhart, C. E.; Brickey, R.; Mewton, J.

2026-02-28 genetic and genomic medicine 10.64898/2026.02.26.26347200 medRxiv
Top 0.1%
4.0%
Show abstract

The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of benchmark single nucleotide polymorphisms (SNPs) excludes rare or novel pathogenic variants that can invalidate a star allele call and lead to incorrect dosing recommendations. Furthermore, nearby non-pathogenic variants can interfere with haplotype interpretation, introducing additional risk of misclassification. To address these gaps, we developed PHARMWATCH, a multistep pharmacogenomics workflow for comprehensive variant analysis, allele tracking, and contextual interpretation. PHARMWATCH incorporates two algorithmic safeguards designed to identify genomic alterations that compromise star allele accuracy: (1) de novo germline variant screening using the ACMG-based BIAS-2015 classifier and (2) variant interpretation in context (VIIC) to validate the functional integrity of star allele-defining SNPs [3]. Together, these layers enhance the reliability of pharmacogenomic reporting, enabling safe, automated, and review-ready recommendations that extend beyond the constraints of traditional star allele-based approaches.

14
GEN-KnowRD: Reframing AI for Rare Disease Recognition

Yan, C.; Su, W.-C.; Xin, Y.; Grabowska, M. E.; Kerchberger, V. E.; Borza, V. A.; Wang, J.; Wang, L.; Li, R.; Lynn, J.; Dickson, A. L.; Shyr, C.; Feng, Q.; Stein, C. M.; Wang, K.; Embi, P.; Malin, B. A.; Liu, H.; Wei, W.-Q.

2026-03-03 health informatics 10.64898/2026.03.02.26347469 medRxiv
Top 0.1%
4.0%
Show abstract

Rare diseases affect over 300 million people worldwide, yet patients often endure years-long diagnostic delays that limit timely intervention and trial opportunities. Computational rare disease recognition (RDR) remains constrained by knowledge resources that are often incomplete, heterogeneous, and dependent on extensive multi-disciplinary expert curation that cannot scale. Large language models (LLMs) applied directly for end-to-end diagnosis or disease discrimination face similar knowledge bottlenecks while also raising concerns around cost, reproducibility, and data governance. Here, we introduce GEN-KnowRD, a knowledge-layer-first framework that leverages LLMs to generate schema-guided rare disease profiles, systematically assesses their quality, and constructs a computable knowledge base (PheMAP-RD) for local deployment. GEN-KnowRD integrates this knowledge into lightweight inference pipelines for both general-purpose disease screening and specialized early discrimination from longitudinal electronic health records. Across six public benchmarks for general-purpose screen (9,290 patients spanning 798 rare diseases), GEN-KnowRD significantly improves disease ranking compared to a state-of-the-art, HPO-centered diagnostic framework (up to 345.8% improvement in top-1 success), advanced end-to-end LLM reasoning (up to 129.1% improvement), and a variant of GEN-KnowRD instantiated with expert-curated knowledge rather than LLM-generated profiles. In two real-world cohorts for early diagnosis of idiopathic pulmonary fibrosis (511 patients) as a use case, GEN-KnowRD also demonstrates robust discrimination performance gains, supporting effective RDR during the pre-diagnostic window. These findings demonstrate that repositioning LLMs from diagnostic reasoning to the knowledge layer--decoupling knowledge construction from patient-level inference--yields stronger RDR, while providing scalable, continuously updatable, and reusable infrastructure for diagnosis, screening, and clinical research across the rare disease landscape.

15
The Representativeness of Regional Influenza Virus Genomic Surveillance for National Trends in the United States

Ragonnet-Cronin, M.; Papalambros, L.; Bendall, E. E.; Kitzsimmons, W. J.; Blair, C. N.; Tibbetts, R.; Bhargava, A.; Lauring, A.

2026-03-02 infectious diseases 10.64898/2026.02.23.26346422 medRxiv
Top 0.1%
3.8%
Show abstract

Genomic surveillance of influenza viruses informs vaccine strain selection and evolutionary forecasting. Sequencing efforts vary widely across U.S. states, which raises concerns about spatial sampling bias. We evaluated how well 10,958 influenza virus genomes sampled by our group in Michigan captured the genetic diversity in 34,743 genomes circulating nationally from the 2021/22 through 2024/25 seasons. We defined seasonal hemagglutinin haplotypes and tracked their detection across states. A small number of haplotypes dominated each season, and Michigan detected all major haplotypes, even under substantial downsampling. Detection delays were primarily driven by haplotype frequency rather than geographic factors. Comparisons across states showed that higher sequencing effort improved coverage and detection timeliness, with diminishing returns at higher volumes. Rarefaction analysis confirmed that relatively few sequences were needed to capture 95% of national haplotype diversity. These findings suggest that intensive sequencing in a single well-sampled location can be broadly representative of national influenza diversity. One sentence summaryDense influenza genomic sequencing from a single U.S. state captured nearly all nationally circulating haplotype diversity, with detection timeliness primarily driven by sequencing effort and haplotype frequency.

16
The Causal Impact of Natural Language Processing-Driven Clinical Decision Support on Sepsis Mortality in England: An Augmented Synthetic Control Analysis of NHS Trust-Level Data

Whitfield, J. A.; Graves, E. M.

2026-03-02 health informatics 10.64898/2026.02.27.26347253 medRxiv
Top 0.1%
3.8%
Show abstract

Withdrawal StatementThis article has been withdrawn by medRxiv because it was submitted with false information.

17
HLA-Resolve: High-Resolution HLA Haplotyping Using Long-Read Hybrid Capture

Glasenapp, M. R.; Yee, M.-C.; Symons, A. E.; Cornejo, O. E.; Garcia, O. A.

2026-03-30 genetic and genomic medicine 10.64898/2026.03.27.26349549 medRxiv
Top 0.1%
3.8%
Show abstract

Accurate HLA typing is critical for transplantation, pharmacogenomics, and disease risk prediction, yet short-read approaches cannot resolve the HLA region's extreme polymorphism. Long-read sequencing improves resolution, but its adoption has been limited by higher cost, reduced base accuracy, limited throughput, and reliance on long-range PCR. To overcome these limitations, we present a multiplexed long-read hybrid capture workflow for PacBio and Oxford Nanopore sequencing that enriches all classical HLA loci and the complete HLA Class III region. A single-step enzymatic fragmentation and barcoding strategy enables automated library prep. We also introduce HLA-Resolve, an HLA typing program optimized for HiFi reads, and validate workflow performance against the Genome in a Bottle, Human Pangenome Reference Consortium, and International Histocompatibility Working Group benchmarks using 32 geographically diverse samples. These advances offer a cost-effective approach for high-resolution HLA typing with clinical applicability and enable investigation of the role of HLA Class III variation in disease.

18
Pharmacogenomic variant profiling in 14,490 Koreans using a population-specific genotyping array

Park, S.; Seo, M.; Park, C. H.; Park, H.-Y.; Kim, Y. J.; Kim, B.-J.

2026-01-26 genetic and genomic medicine 10.64898/2026.01.19.26344411 medRxiv
Top 0.1%
3.7%
Show abstract

Pharmacogenomics is an essential component of precision medicine; however, most existing knowledge has been derived from populations of European ancestry, limiting the understanding of pharmacogenomic diversity in East Asian populations. In this study, we applied genotype imputation to the Korea Biobank Array v2.0 using a reference panel of 8,062 Korean whole-genome sequencing (WGS) samples and analyzed pharmacogenomic variants and phenotypes in 14,490 Korean individuals. To assess the accuracy of imputation-based variant detection, we compared imputed genotypes with matched WGS data from 735 individuals and with genotypes obtained from the commercial PangenomiX Plus Array Kit for an additional 137 individuals, demonstrating high concordance. When extended to the full cohort, all individuals were found to carry at least one pharmacogenomic variant, with high frequencies observed in key pharmacogenes including CYP2C19, SLCO1B1, CYP3A5, and VKORC1. Phenotype distributions were broadly consistent with previous WGS-based studies in East Asians but showed notable differences compared with European populations. Overall, this population-specific, large-scale analysis provides a comprehensive pharmacogenomic landscape in Koreans and highlights the importance of ancestry-tailored data for equitable precision medicine.

19
Automated epilepsy and seizure type phenotyping with pre-trained language models

Chang, E.; Xie, K.; Zhou, D.; Korzun, J.; Conrad, E.; Roth, D.; Ellis, C.; Litt, B.

2026-02-22 neurology 10.64898/2026.02.11.26346003 medRxiv
Top 0.1%
3.7%
Show abstract

BackgroundEpilepsy is a common neurologic disorder characterized by recurrent, unprovoked seizures. Epilepsy manifests as different seizure types and epilepsy types, which have important implications for treatment and prognosis. Electronic health record systems containing longitudinal data on large epilepsy cohorts can be valuable resources for clinical research. However, detailed epilepsy phenotypes are poorly captured by structured data such as diagnostic codes and are instead buried in unstructured clinical notes. MethodsWe evaluated two transformer-based language models for automated epilepsy and seizure type phenotyping from clinical notes: a fine-tuned BERT model and a large language model, DeepSeek-R1. A subset of notes was annotated by epileptologists, and model performance was benchmarked against expert agreement. The best-performing model was then deployed across all epilepsy progress notes at a large academic medical center to generate patient-level longitudinal epilepsy and seizure phenotypes. ResultsBoth models achieved performance comparable to expert agreement for classifying epilepsy type as focal, generalized, or unspecified (Matthews correlation coefficient [95% CI]: DeepSeek = 0.85 [0.80-0.90], BERT = 0.73 [0.67-0.80], human = 0.77 [0.70-0.83]) and classifying seizure type as convulsive or non-convulsive (DeepSeek = 0.74 [0.66-0.81], BERT = 0.60 [0.49-0.69], human = 0.49 [0.39-0.59]). For more granular classification tasks, DeepSeek maintained performance comparable to expert agreement, whereas BERT performance declined. Deploying DeepSeek-R1 on 77,049 clinical notes from 18,566 patients revealed system-level clinical patterns, including diagnostic stabilization over time, frequent co-occurrence of seizure types, and variation in seizure outcomes by epilepsy type. ConclusionsBy extracting expert-level epilepsy phenotypes from routine clinical text at scale, this approach transforms unstructured EHR data into a resource for longitudinal, population-informed epilepsy care. Automated phenotyping enables analyses of epilepsy trajectories and treatment outcomes that are not feasible with structured data alone, supporting future clinical and translational research applications.

20
Scaling haplospecific antisense oligonucleotides from N-of-1 to broad use in genetic disease populations by diplotyping

Kim McManus, O.; Goddard, P.; Olsson, S.; Protopsaltis, L.; Gleeson, J. G.; Zhang, Q.; Kahn, N.; Crawford, A.; Kingsmore, S.

2026-02-04 genetic and genomic medicine 10.64898/2026.01.28.26345012 medRxiv
Top 0.1%
3.6%
Show abstract

Antisense oligonucleotides (ASO) are versatile disease modifying therapies for genetic diseases. An accelerated FD) pathway enables ASO treatment trial initiation in single patients within a year. However, this rapid N-of-1 pathway lacks extensibility to broad use necessary for sustainability. Individualized ASOs bind pre-mRNAs encompassing an entire locus. Thus, ASOs targeting common heterozygous polymorphisms (SNPs) are potentially haplospecific in many patients with dominant disorders. We developed haplospecific ASOs for two patients with SCN2A-Complex Neurodevelopmental Disorder (CND) and gain-of-function (GOF) or mixed gain-loss dysfunction (GLD) variants. The ASOs targeted reference SCN2A intronic sequences containing SNPs. The patients each had SCN2A haplotypes with reference SNP alleles in cis with causal variants and alternate SNP alleles in cis with normal mRNA. Following N-of-1 demonstration of safety and efficacy, we evaluated their applicability to 21 SCN2A-CND patients using whole genome sequencing (WGS) with haplotyping by read proximity. Ten (48%) patients had ASO-eligible diplotypes. Haplotype analysis of 1000 Genomes Project (1kGP) participants revealed 16 additional SCN2A haplotypes present in >20% of subjects, tagged by 156 SNPs. In silico assessment of specificity and potency identified additional haplospecific ASOs for validation in reprogrammed 1kGP cells heterozygous for the tagging SNPs. A combination of 4 haplospecific ASOs provided coverage for 76% of 1kGP subjects, potentially scaling N-of-1 FDA applications in future SCN2A-CND patients with GOF/GLD variants by ASO selection by diagnosis by WGS with haplotyping. Thus, population resources have potential to prepare haplospecific ASO therapies a priori for many patients and genetic diseases, with individual selection by WGS haplotyping. One Sentence SummaryPopulation genome sequencing with haplotyping identifies haplospecific antisense oligonucleotides for disease modifying therapy of genetic disorders.