Database — Latest Matching Preprints

1

Aggregating data to accelerate personalized therapy in heart failure (ADAPT-HF)

Roeder, C.; Goerg, C.; Talebi, A.; Stevens, L. M.; Scholtens, D. M.; Rasmussen-Torvik, L. P.; Alagna, L. M.; Shah, S. J.; Hall, J. L.; Das, A. K.; Jhund, P. S.; Kao, D. P.

2026-07-16 health informatics 10.64898/2026.07.13.26357501 medRxiv

Top 0.4%

2.1%

Show abstract

Background: Increased public access to data from disparate sources provides opportunities to study and validate predictive and subphenotype models in heterogeneous disease conditions using aggregated individual patient data. Robust, explicit, and transparent harmonization of data elements is critical to ensure interpretability, reproducibility, and generalizability of secondary and retrospective analyses. Methods & Results: We designed and implemented ADAPT (Aggregating Data to Accelerate Personalized Therapy), a scalable framework using multiple software packages (R, SQL, BigQuery) that enables rapid, explicit harmonization of structured data elements from randomized trials and observational studies using a standard spreadsheet interface. User-specified criteria are applied to primary study data to produce harmonized longitudinal datasets comprised of demographics, medical history, quantitative observations, repeated measures, and clinical outcomes. We demonstrate this functionality using 26 clinical studies found in the National Heart, Lung, and Blood Institute BioLINCC resource. We illustrate the scalability of ADAPT to the order of billions of datapoints using administrative clinical data in a cloud-computing platform. We also present examples of collaborators using ADAPT for independent harmonization tasks for secondary analyses and democratization of publicly available data. Conclusion: ADAPT is a disease-agnostic, extensible, and scalable platform to support robust, transparent harmonization of structured research data using interfaces accessible to a variety of researchers regardless of programming ability. It extends FAIR principles beyond research data to also represent harmonization analyses by improving Findability of harmonization decisions, Accessibility of methods to other stakeholders, Interoperability with independent analyses and datasets, and Reusability through efficient implementation in a variety of analysis environments.

2

Mapping Topic Change in Influential Hepatocellular Carcinoma Research: A Two-Cohort Bibliometric Analysis

Su, Z.; Li, T.

2026-07-16 oncology 10.64898/2026.07.07.26357427 medRxiv

Top 0.5%

1.5%

Show abstract

The therapeutic landscape for hepatocellular carcinoma (HCC) is evolving rapidly, necessitating scalable approaches to synthesize the expanding scientific literature. We characterized thematic shifts in HCC treatment and prognosis research by conducting a retrospective bibliometric analysis of influential publications from 2023 and 2024. Using the OpenAlex database, we identified the 50 most highly cited papers from each year based on eighteen-month post-publication citation counts. Large language models were deployed to extract, normalize, and classify concepts from unstructured text into canonical topics and parent themes, enabling quantitative year-over-year frequency comparisons. Analysis of these 100 papers revealed a distinct maturation in research focus. Although broad categories like general immunotherapy remained prevalent, their relative frequency declined in favor of specific dual immune checkpoint regimens, notably CTLA-4 inhibition and the durvalumab plus tremelimumab combination. Concurrently, parent themes related to radiomics, imaging, and health systems exhibited significant growth in the 2024 cohort. These findings demonstrate a thematic transition in high-impact HCC research from foundational immuno-oncology toward optimized combination therapies and precision diagnostics. Furthermore, this study highlights the utility of artificial intelligence-driven bibliometrics for objectively tracking dynamic conceptual shifts in oncology. A web interface for exploring the data is available at https://pri.pepkio.com/.

3

Feasibility of using automatically extracted routine clinical data in a respiratory cohort study: The SPHN-SPAC demonstrator project.

Romero, F.; Sasaki, M.; Mallet, M. C.; Pedersen, E. S. L.; Leuenberger, L. M.; Makhoul, R.; Bovermann, X.; Hartung, A.; Latzin, P.; Kissling, S.; Moeller, A.; Treis, A.; Regamey, N.; Belle, F. N.; Kuehni, C. E.

2026-07-16 epidemiology 10.64898/2026.07.14.26357927 medRxiv

Top 0.9%

0.9%

Show abstract

Objectives To assess the feasibility of using clinical data automatically extracted via the Swiss Personalized Health Network (SPHN) to complement or replace manually abstracted clinical data in the Swiss Paediatric Airway Cohort (SPAC). Materials and Methods We studied 1,075 SPAC participants enrolled between 2017-2023 at two Swiss children's hospitals. Clinical data were extracted from electronic health records via SPHN in Resource Description Framework format, transformed into visit-centered datasets, and compared with manually abstracted SPAC clinical data and parent-reported emergency department (ED) visits and hospitalizations from follow-up questionnaires. We assessed feasibility by identifying challenges in acquiring data and evaluated data quantity, completeness, and agreement between datasets. Results We obtained analysis-ready SPHN-derived datasets from two hospitals after 24 months. SPHN-derived data captured more pneumology outpatient visits than manual abstraction (Hospital A: 1,963 vs 1,049; Hospital B: 2,343 vs 1,010) and identified clinical events among children without follow-up questionnaires. Completeness of variables varied across hospitals and encounters, reflecting differences in local clinical documentation practices. SPHN-derived and manually abstracted data showed high agreement for structured clinical variables, including spirometry measurements (concordance correlation coefficient >0.99). Self-reported and SPHN-derived ED visits and hospitalizations showed high absolute agreement but moderate concordance. Discussion and Conclusion Automated extraction of routine clinical data increased the completeness of longitudinal information compared with manual abstraction, suggesting that SPHN-derived data can complement manual data collection in cohort studies. Broader use remains limited by heterogeneous clinical documentation practices and the substantial effort required to harmonize and transform extracted data into analysis-ready research datasets.

4

FoodScribe: an open-source semantic framework for nutrient estimation from free-text dietary records

Gouda, H.; Sala Climent, M.; Agongo, J.; Gaikwad, S. P.; Nattakom, A.; Zhao, H. N.; Xing, S.; Boland, B. S.; Holt, T.; Guma, M.; Dorrestein, P. C.

2026-07-17 nutrition 10.64898/2026.07.15.26358181 medRxiv

Top 2%

0.5%

Show abstract

Efficiently summarizing dietary records at scale remains a persistent bottleneck in nutritional epidemiology. We present FoodScribe, which translates free-text meal descriptions into quantitative nutrient profiles by combining ingredient parsing with nutrient retrieval by querying the USDA FoodData Central (FDC) database. Benchmarked using three LLM providers using Nutribench dataset, FoodScribe completed annotation of 3,807 meal descriptions in 2.5 hours, a task otherwise requiring substantial manual effort from trained nutritionists. FoodScribe achieved accuracy across macronutrient estimation (F1=0.79-0.89), with models performing better for protein than fat estimation. Application to a Mediterranean diet intervention cohort indicated dietary shifts consistent with the intervention pattern based on model-derived estimates. Integration with metabolomics data suggested that fiber and vegetable intake were positively associated with a fecal metabolite cluster.

5

Automated Detection of Motor Speech Disorders and Subtype Classification

Wang, F.; Utianski, R. L.; Barnard, L. R.; Stricker, J. L.; Clark, H. M.; Meade, G. F.; Jones, D. T.; Whitwell, J. L.; Josephs, K. A.; Duffy, J. R.; Botha, H.

2026-07-19 neurology 10.64898/2026.07.16.26358268 medRxiv

Top 2%

0.4%

Show abstract

Motor speech disorders (MSDs) are early markers of neurological disease, but expert perceptual analysis is rarely available outside specialized centers. Automated speech analysis offers a scalable alternative, yet prior studies have not systematically compared modeling approaches or assessed clinically relevant metrics in independent datasets. This study compared static acoustic features, articulatory informed Phonet features, and self-supervised pretrained models for binary and multi label MSD classification. We trained and evaluated models on 583 speech samples using speaker level splits. Baseline models included logistic regression and Gated Recurrent Units (GRUs) trained on eGeMAPS and MFCCs. We extracted three types of Phonet derived features and evaluated pretrained HuBERT and SSAST models in frozen, partially fine-tuned, and fully fine-tuned configurations. Binary classification distinguished MSDs from controls, while multi label classification identified six MSD subtypes. Models were assessed using validation AUC, and cut points were tested on two independent datasets. Pretrained and Phonet based models substantially outperformed static acoustic features. In binary classification, HuBERT achieved the highest AUC (0.95), while compact Phonet derived GRUs achieved comparable performance (up to 0.94). These models generalized well to independent datasets, maintaining high sensitivity (0.94) and specificity (0.97). In multi label classification, Phonet models achieved the highest macro average AUC (0.86), but threshold-based subtype performance declined on unseen data. Automated MSD detection is feasible and clinically promising. Binary classification generalized well, whereas multi label classification showed limited threshold stability across datasets.

6

Computational design of a multi-epitope vaccine against M. tuberculosis

Buhari, A.; Okutu, P.; Oyeleke, U. A.; Sivakumar, A.; Hameed, S. A.

2026-07-15 bioinformatics 10.64898/2026.07.09.737463 medRxiv

Top 2%

0.3%

Show abstract

BackgroundTuberculosis remains a leading global infectious killer, with BCG offering inconsistent adult protection and rising drug-resistant strains demanding novel vaccine strategies. We report the first multi-epitope vaccine construct simultaneously targeting three previously unexplored Mycobacterium tuberculosis virulence proteins; EccB3, MycP, and polyketide synthase which collectively govern nutrient acquisition, ESX secretion integrity, and innate immune evasion. MethodsUsing a reverse vaccinology pipeline, B-cell, CTL, and HTL epitopes were predicted, filtered for allergenicity, toxicity, and IFN-{gamma} induction, then assembled into an 823-residue chimeric construct incorporating beta-defensin and PADRE adjuvants with AAY/GPGPG linkers, covering [~]90% global HLA diversity. The construct underwent AlphaFold structure prediction, 3DRefine refinement, disulfide engineering, PROCHECK/ProSA validation, ClusPro 2.0 docking against TLR1/TLR2, and C-IMMSIM immune simulation. ResultsThe construct (82.3 kDa, instability index 32.48) showed strong structural quality (94.7% favoured Ramachandran residues), stable TLR1/TLR2 binding (weighted energy: -1,371.0 kcal/mol), and robust in silico immune responses and durable memory cell formation following booster simulation. ConclusionThis computationally validated construct represents a promising multi-target TB vaccine candidate warranting experimental advancement.

7

ICH-CARE: ICH-integrated Care for Accelerated Response to Hemorrhage Using a Phased Approach.

Salman, S.; English, S.; Mooney, L.; Miller, D.; Ng, L.; Kramer, C.; Ombada, M.; Tawk, R.; Freeman, W. D.

2026-07-21 neurology 10.64898/2026.07.18.26358392 medRxiv

Top 2%

0.3%

Show abstract

Introduction: Intracerebral hemorrhage (ICH) carries higher morbidity and mortality than ischemic stroke. Recent studies have demonstrated improved patient outcomes by applying ultra-early bundled interventions including blood pressure management, coagulopathy reversal, and osmotic therapy. Effective strategies to deliver these ultra-early treatment options are currently being explored. On December 19th, 2022, the Mayo Clinic Comprehensive Stroke Center (CSC) launched the "ICH Phases'' communication system to accelerate ICH patient care. Objective: To evaluate adherence to the AHA/ASA guidelines in acute ICH care following the implementation of our novel-tiered paging system. Methods: We retrospectively reviewed patients admitted with spontaneous ICH during 2024 and 2025. We excluded traumatic cases. We extracted clinical data such as time to imaging, documentation of ICH score, blood pressure control, reversal of anticoagulation, venous thrombo-embolism (VTE) prophylaxis and discharge disposition. Results: Among 67 patients, 68.7% underwent CT imaging within 25 minutes. We documented the ICH score within 6 hours in 82.9% of patients. Nearly 94.7% of patients with SBP>140 mm Hg received antihypertensive therapy, yet only 18% reached target BP within 60 minutes. We completed the reversal of anticoagulation within 120 minutes in 75% of patients. VTE prophylaxis was initiated within 24 hours in 91% of patients. Discussion: Our novel system demonstrated adherence to the AHA/ASA guidelines, and time sensitive benchmarks in neuroimaging, reversal of anticoagulation, and VTE prophylaxis. Early BP control remains a challenge, that highlights the discrepancy between guidelines and real-ground implementation. Conclusion: A novel tiered paging system is effective for enhancing early ICH care. Such a holistic system remains critical for sustained improvement in quality of care.

8

Construction of a risk prediction model for postoperative bleeding in patients with thyroid cancer based on clinical data

zhang, y.; chen, w.; li, x.; shen, w.

2026-07-18 oncology 10.64898/2026.07.16.26358297 medRxiv

Top 2%

0.3%

Show abstract

Objective To develop and validate a risk model for predicting postoperative bleeding in patients with thyroid cancer. Methods A total of 2800 consecutive patients diagnosed with thyroid cancer in the Department of Thyroid and Breast Surgery of the Affiliated Hospital of Xuzhou Medical University between January 2020 and December 2023 were retrospectively analyzed. Patients were categorized into two groups based on postoperative bleeding occurrence: bleeding and non-bleeding groups. Univariate and multivariate logistic regression analyses were utilized to screen independent risk factors. Meanwhile, risk prediction models were developed and nomogram . Subgroup analysis was performed to identify independent risk factors. The predictive effects of the models were assessed using the Hosmer-Lemeshow test and receiver operating characteristic (ROC) curves. Results Of the 2800 recruited patients, 50 had postoperative bleeding, with an incidence rate of 1.7%. Multivariate logistic regression analysis showed that age, hypertension, total thyroidectomy, tumor size [≥]4 cm, and operation time [≥]90 min were the risk factors for postoperative bleeding in thyroid cancer patients (P<0.05). A risk prediction model was established based on the above factors, and the area under the ROC curve was 0.881, with a sensitivity of 94.0%, a specificity of 67.3%, and an accuracy of 74.0%. Decision curve analysis revealed that the model had good predictive ability. Conclusions The constructed risk prediction model has good predictive power and can provide a reference for healthcare professionals to predict the risk of bleeding in patients after thyroid cancer surgery.

9

Association between Glycemic Traits and Delayed Cerebral Infarction among Non-Diabetic Patients with Aneurysmal Subarachnoid Hemorrhage: A Nested Case-Control Study

Ji, P.; Zheng, K.; Tan, D.; Xu, J.; Chen, M.; Wu, Y.; He, Z.

2026-07-20 neurology 10.64898/2026.07.18.26358375 medRxiv

Top 2%

0.3%

Show abstract

ABSTRACT Objective Delayed cerebral infarction (DCIn) is a severe complication following aneurysmal subarachnoid hemorrhage (aSAH). Previous studies suggest that glycemic variability is associated with DCIn. However, whether diabetes status modifies the relationship between glycemic traits and DCIn remains unknown. Methods Clinical data were collected from aSAH patients admitted to the First Affiliated Hospital of Shantou University Medical College between January 2015 and April 2025. The collected data included demographic characteristics, clinical variables, and glycemic traits. Glycemic traits included mean blood glucose (GLU-M), standard deviation of blood glucose (GLU-SD), coefficient of variation of blood glucose (GLU-CV), variance of blood glucose (GLU-Var), range of blood glucose (GLU-R), average real variability of blood glucose (GLU-ARV), and variability independent of the mean (GLU-VIM). After 1:2 case-control matching, conditional logistic regression models were used to evaluate the associations between glycemic traits and DCIn risk, with stratified analyses performed according to diabetes status. Multiplicative interaction terms were additionally included to assess the potential modifying effect of diabetes status. Results A total of 306 patients with aSAH were included. Among them, 102 developed DCIn cases. For each of these 102 cases, two controls were matched by age ({+/-}5 years), sex and year of admission ({+/-}5 years). In the overall population, higher GLU-M and GLU-ARV were associated with increased DCIn risk, with odds ratios (ORs) per 1-SD increase of 1.62 (95% CI, 1.25-2.11) and 1.63 (95% CI, 1.25-2.11), respectively. Among patients without diabetes (n=266), the associations with DCIn per 1-SD were observed for GLU-M (OR, 2.23; 95% CI, 1.56-3.19), GLU-SD (OR, 1.53; 95% CI, 1.13-2.06), GLU-Var (OR, 1.48; 95% CI, 1.04-2.10), and GLU-ARV (OR, 1.88; 95% CI, 1.38-2.55). No significant associations were observed among patients with diabetes. Significant interactions were observed between diabetes status and GLU-SD and GLU-Var, with P for interaction values of 0.033 and 0.032, respectively. Conclusion Higher mean blood glucose and greater glycemic variability are associated with an increased risk of DCIn in aSAH patients, especially in those without diabetes.

10

A Multimodal Benchmark for Evaluating Cause-of-Death Inference Using Child Health and Mortality Data

Yang, J.; Pan, S.; Lim, H. S.; Chu, Y.; Guo, Y.; Agarwal, N.; Babbar, V.; Parikh, G. R.; Chen, Y. T.; Rees, C. A.; Dangor, Z.; Lala, S. G.; Li, Z. R.; Clark, S. J.; Wu, Z.; Datta, A.; Liu, L.; Rudin, C.; Scarpino, S. V.; Gyori, B. M.; McCormick, T. H.

2026-07-15 public and global health 10.64898/2026.07.13.26357980 medRxiv

Top 2%

0.3%

Show abstract

Accurately attributing causes of death is vital for global health, yet fewer than 5% of deaths in resource-constrained regions are medically certified. To assign causes to these unlabeled deaths at scale, practitioners traditionally rely on verbal autopsy, using supervised statistical models to classify based on structured survey data. However, modern mortality surveillance increasingly collects rich, unstructured multimodal data, such as free-text caregiver narratives and postmortem diagnostics, which traditional supervised statistical models struggle to seamlessly integrate. In this paper, we present a comprehensive, multimodal benchmark for cause-of-death classification using data from the Child Health and Mortality Prevention Surveillance (CHAMPS) network, a unique surveillance platform spanning nine countries across South Asia and Sub-Saharan Africa. Using this dataset, we introduce an evaluation framework designed to rigorously assess diagnostic reasoning, moving beyond traditional metrics that fail to capture complex clinical realities. We demonstrate the utility of this benchmark by evaluating zero-shot large language models against supervised baselines across various data modalities. Our results reveal distinct differences in how these modeling approaches synthesize unstructured medical evidence. This benchmark provide a rigorously defined resource for assessing clinical reasoning in next-generation mortality surveillance.

11

Machine learning and data-driven models for predicting post-stroke dysphagia: a systematic review and meta-analysis

Mohammadi Yazdi, S.; Motevaselian, M.; Khatami, S.; Radfar, N.; jourahmad, z.; Perez, H. A.

2026-07-17 neurology 10.64898/2026.07.15.26358113 medRxiv

Top 3%

0.2%

Show abstract

Background: Post-stroke dysphagia (PSD) contributes to aspiration, pneumonia, malnutrition, prolonged hospitalization and mortality. We evaluated the discrimination, validity and readiness of machine learning and data-driven prediction models for PSD-related outcomes. Methods: Following a prospectively registered protocol (PROSPERO CRD420261419259), we searched PubMed/MEDLINE, Embase, Web of Science Core Collection, CINAHL and CENTRAL from inception through June 7, 2026. Eligible studies developed or validated multivariable prediction models for PSD-related outcomes in adults with stroke. We used PROBAST and PROBAST+AI to assess risk of bias and applicability and TRIPOD+AI to evaluate reporting. Area under the curve (AUC) estimates were pooled on the logit scale with random-effects models. Results: Twenty-four studies were included and ten contributed to meta-analysis. Four studies predicting early or incident PSD yielded a pooled AUC of 0.94 (95% CI 0.60-0.99; I2 = 95.6%). Pooled AUCs were 0.84 (95% CI 0.71-0.92) for aspiration or penetration-aspiration and 0.89 (95% CI 0.24-1.00) for severe dysphagia. The exploratory analysis of all ten risk-prediction models produced an AUC of 0.90 (95% CI 0.80-0.95), but heterogeneity was substantial (I2 = 90.3%) and the prediction interval was 0.51-0.99. Every study had high risk of bias because of analysis-domain concerns; calibration and external validation were uncommon. Conclusions: Reported discrimination was often high, but the evidence does not establish reliable performance in care. Independent validation, calibration, complete model reporting and clinical-impact studies are needed before these models guide post-stroke swallowing care. Keywords: Post-stroke dysphagia; Stroke; Deglutition disorders; Machine learning; Clinical prediction model; Area under the curve; Meta-analysis

12

Cube-based screening identifies a quinoa-derived synthetic microbial community that promotes plant growth and modulates root epidermal responses under salt stress

Dangjarean, H.; Murata, Y.; Kobayashi, Y.; Neyrot, S.; Ogata, T.; Fujita, Y.

2026-07-15 plant biology 10.64898/2026.07.15.738596 medRxiv

Top 3%

0.2%

Show abstract

Plant-associated bacteria can improve plant performance under abiotic stress, but beneficial functions in plant microbiomes may depend on defined combinations of microorganisms rather than individual isolates alone. Here, we developed a cube-based screening strategy to identify functional synthetic microbial communities (SynComs) from 135 quinoa-associated bacterial isolates while preserving combinatorial diversity and traceability of isolate-level contributions. The isolates were divided into five 27-isolate sets, each arranged as a 3 x 3 x 3 cube in which each 3 x 3 layer was defined as a 9-isolate SynCom, generating 45 SynComs in total. Screening under 100 mM NaCl identified SynCom DY1 (SCDY1) as a candidate salt stress-mitigating consortium. SCDY1 consisted of nine taxonomically diverse isolates and exhibited a multifunctional profile, including siderophore production, phosphate solubilization, carboxymethyl cellulose degradation, indole compound production, and growth under saline conditions. In Arabidopsis thaliana, SCDY1 promoted primary root elongation and biomass accumulation in a salinity-dependent manner, with the clearest effect under 120 mM NaCl, and at least a subset of constituent bacteria was recoverable from inoculated seedlings. RNA sequencing and targeted RT-qPCR indicated that SCDY1 modulated host gene expression under moderate salinity stress, with responsive genes associated with oxidative stress, water- and oxygen-related processes, phenylpropanoid biosynthesis, glutathione metabolism, and root epidermis-related processes. Root hair phenotyping further showed that SCDY1 enhanced root hair-related traits and shifted visible root hair formation closer to the root apex. These findings identify a quinoa-derived SynCom that improves plant performance under salinity stress and provide a practical, traceable framework for discovering beneficial microbial consortia from plant-associated bacterial collections. Scope statementThis manuscript fits the Research Topic "Harnessing Plant Microbiomes for Climate Resilience: From Ecological Insight to Synthetic Community Design" in Frontiers in Plant Science because it presents a traceable strategy for discovering functional synthetic microbial communities from a stress-adapted plant-associated bacterial collection. We developed a cube-based screening strategy using 135 quinoa-associated bacterial isolates and identified a nine-isolate synthetic microbial community, SCDY1, that promotes Arabidopsis growth under moderate salinity stress. The study integrates microbiological screening, characterization of plant growth-promoting traits, bacterial re-isolation, plant growth phenotyping, RNA-seq, RT-qPCR, and root hair phenotyping. These analyses link SCDY1 treatment to salinity-dependent growth promotion, recoverable bacterial members, stress- and redox-associated transcriptional changes, phenylpropanoid-related responses, and modulation of root epidermal phenotypes. By connecting a defined SynCom with host transcriptional and root epidermal responses, this work advances understanding of beneficial plant-microbe interactions under salt stress. The cube-based design also provides a practical and traceable framework for discovering functional SynComs from large plant-associated bacterial collections, which should be of interest to researchers studying plant symbiosis, microbiome engineering, abiotic stress tolerance, and sustainable crop improvement.

13

FHIRBench: Benchmarking FHIR Clinical Data Serialization Strategies for Large Language Models

Chong, J.

2026-07-15 health informatics 10.64898/2026.07.14.26358020 medRxiv

Top 3%

0.2%

Show abstract

We present FHIRBench, a benchmark evaluating six FHIR clinical data serialization strategies across four frontier LLMs (Claude Sonnet 4.5, GPT-5.4, DeepSeek V3.2, Qwen3 32B) on three clinical tasks using 100 stratified synthetic FHIR R4 patient bundles. We employ two evaluation layers: token-level F1 and LLM-as-judge rubric on four clinical dimensions, yielding 7,200 evaluations per layer. Our findings reveal four results. First, serialization significantly impacts quality but the direction diverges between layers: Condensed outperforms Raw JSON on F1 for 3/4 models (Wilcoxon p < 10^-17), while Raw JSON achieves higher judge scores for 3/4 models (p < 10^-7). Narrative achieves 95% of Raw JSON's quality at 83% fewer tokens. Second, model rankings completely reverse between layers -- Claude ranks last on F1 but first on clinical quality (p = 1.0 x 10^-6), demonstrating that single-metric evaluation produces misleading model selection. Third, a significant Model x Serializer interaction (Friedman p = 0.0009) precludes universal format recommendations, with GPT-5.4 favoring Raw JSON while open-weight models favor compressed formats. Fourth, Llama 3.1 70B exhibits 100% inference failure on complex patients despite operating within its nominal context window, revealing a patient-safety gap where AI fails for the patients who need it most. These findings establish that clinical AI systems require model-aware serialization middleware, multi-layer evaluation frameworks, and capacity verification before deployment. Code and data publicly available.

14

A ReAct Agentic AI System for Natural Language Querying and Statistical Analysis of The Cancer Genome Atlas Clinical Data

Korutla, R.; Amal, S.

2026-07-17 health informatics 10.64898/2026.07.15.26358188 medRxiv

Top 4%

0.2%

Show abstract

The Cancer Genome Atlas (TCGA) holds clinical data for over 11,000 patients across 33 cancer types, but access is hard because of complex file structures, heterogeneous formats, and the need for programming. We present an agentic system for natural language querying and statistical analysis of TCGA clinical data. The system uses a large language model as an autonomous ReAct agent that selects from eight computational tools, including data extraction, descriptive statistics, Kaplan-Meier survival analysis with log-rank tests, hypothesis testing, and verification against the curated TCGA Pan-Cancer Clinical Data Resource (CDR). The agent reasons about intermediate results, adapts its approach, and returns clinically contextualized responses with source attribution and auditable traces. We introduce TCGA-Agent-Bench, 440 queries across five difficulty tiers with ground truth from the independently curated TCGA-CDR, evaluated with dual metrics of numerical accuracy and clinical completeness. The system achieves 93.4% overall accuracy (100% single-patient lookups, 99.1% cohort statistics, 92.8% comparative analyses), outperforming a fixed rule-based pipeline (87.1%), a single-pass LLM (81.8%), and retrieval-augmented generation (66.9% on a subset). Most of the benchmark is answerable from the CDR alone, so we locate the extraction layer's value in fields the CDR lacks (drug treatments, TNM components, biomarkers, biospecimen metadata): on 26 queries targeting these, the full system answers 100% versus 3.8% for CDR-only. Ablations show the reasoning loop is most impactful (+9.1% accuracy, +22.0 completeness points). A tool-based agentic architecture enables accurate, auditable analysis of clinical repositories, with value driven by tool design and recovered fields rather than model scale.

15

Patient-Specific EEG Baseline Establishment Using the E-norms Method for Pediatric Seizure Detection Without Labeled Training Data

Jabre, J. F.

2026-07-16 neurology 10.64898/2026.07.13.26357876 medRxiv

Top 4%

0.2%

Show abstract

The aim of this work is to validate patient-specific EEG baseline establishment using the e-norms method as a screening and retrospective-review tool for seizure detection in pediatric epilepsy. The method was applied to 247 seizure-free EEG recordings (263.92 hours) from 10 patients in the CHB-MIT Scalp EEG Database (ages 3-18). A composite stability metric combining first-derivative dynamics, spectral entropy, variance, and line length was computed per 2-second epoch across 23 channels. Patient-specific detection thresholds were derived from each patient's seizure-free baseline using a weighted statistical procedure. Performance was validated against 72 expert-annotated seizures (2,705 epochs) across 62 seizure files, with durations spanning 6 to 264 seconds (44-fold range). The results show that detection achieved 94.4% event-level sensitivity (68 of 72 seizures; 95% CI 86.6-97.8%) and 81.5% epoch-level sensitivity (2,204 of 2,705 epochs; 95% CI 80.0-82.9%). Eight of ten patients achieved 100% event-level sensitivity with epoch-level sensitivity ranging from 58.7% to 100.0%. Two patients showed partial event-level failures (CHB-15: 17 of 20; CHB-18: 5 of 6), with the four missed events attributable to two characterizable failure modes. Patient-specific thresholds ranged from 4.06 to 4.81 (mean 4.51 +/- 0.25); threshold variation did not correlate reliably with age or sex, confirming that no universal threshold could achieve comparable performance. Detection margins ranged from 0.88 to 1.24 times. Patient-specific e-norms achieves 94.4% event-level sensitivity for pediatric EEG seizure detection without requiring labeled seizure training data, exceeding published human expert inter-rater agreement (50-76%) and recent automated approaches in adult cohorts using behind-the-ear EEG and wearable ECG. Two characterizable failure modes account for the four missed events and inform appropriate clinical use. As a high-sensitivity screening tool complementary to real-time alarm systems, the method is ready for adult validation, prospective deployment, and head-to-head benchmarking.

16

Curation of Mini Mental State Examination (MMSE) Scores in the VA Million Veteran Program (MVP): Applications for Cognitive Aging Research

Lopez, F. V.; Gillis, M.; Lee, S.; Sakamoto, M. S.; Zhang, R.; VA Million Veteran Program, ; Sherva, R.; Logue, M.; Merritt, V. C.

2026-07-16 neurology 10.64898/2026.07.14.26358064 medRxiv

Top 4%

0.2%

Show abstract

Background: Electronic health record (EHR)-linked biorepositories provide opportunities to advance epidemiological research in Alzheimer's disease (AD) and related dementias. Objective: Evaluate the extraction, curation, and associative validity of Mini Mental State Examination (MMSE) scores from the VA EHR for participants in the VA Million Veteran Program (MVP). Methods: The sample (N = 49,555; 7.4% women) included a multiethnic cohort (European [68.3%], African [20.4%], Hispanic [9.0%]) with EHR-extracted MMSE scores; 30.7% were apolipoprotein E (APOE) {epsilon}4 carriers, and 25.8% had multiple scores. Linear regressions examined cross-sectional associations between {epsilon}4 dosage (0, 1, 2) and first and lowest MMSE scores. MMSE scores were also evaluated against MVP dementia diagnostic algorithms in participants aged [≥]65 years. Results: Among participants of European ancestry, there was a significant {epsilon}4 dose-response relationship (ps < .001) with MMSE scores. Homozygote carriers scored lower than heterozygote carriers (Mdiff: first = -0.5; lowest = -0.9), who scored lower than non-carriers (Mdiff: first = -0.4; lowest = -0.6). Among Veterans of African and Hispanic ancestry, no dose-response relationship was observed, although {epsilon}4 carriers had lower scores than non-carriers (ps [≤] .04). MMSE scores corresponded strongly with dementia case/control status across phenotypes: mild impairment on the MMSE was strongly associated with AD (odds ratio [OR] = 11.48), with more severe MMSE impairment showing stronger associations (moderate OR = 17.95; severe OR = 27.83). Conclusion: This study demonstrated MMSE scores can be systematically extracted and curated from the VA EHR. Findings offer a scalable framework for future studies on risk stratification, highlighting the potential for harnessing MVP to explore genetic and clinical factors contributing to cognitive and dementia outcomes in diverse samples.

17

LocusBlend: Flexible multi-index regional visualization of genomic association signals

yang, c.; Cook, N.; Zeng, Y.; Fu, T.; budde, J.; Cruchaga, C.; Belloy, M. E.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.15.26358129 medRxiv

Top 4%

0.1%

Show abstract

Summary It has become standard practice to visualize regional signals from genomewide association studies GWAS using LocusZoom plots Similarly GWAS signals are compared to regionally matched quantitative trait loci QTLs ie varianttogene regulation data using LocusCompare plots to aid assessment of candidate traitrelated genes Despite broad usage these tools annotate variants by linkage disequilibrium LD to a single lead or index variant This singleindex representation has limitations for visualizing complex loci that contain multiple independent signals We present LocusBlend an interactive web application for multiindex LDblended visualization of genomic loci LocusBlend supports one or two genomic association summarystatistic datasets and one to three index variants multiindex LocusZoom colorblended plots and matching LocusCompare visualizations Applications to Alzheimers disease GWAS and QTL signals illustrate LocusBlend enables visualization and separation of independent signals despite shared LD and high genomic complexity Overall LocusBlend is aimed at supporting researchers handle the continuously expanding complexity of human genomics findings Availability and Implementation LocusBlend is freely available at httpslocusblendwustledu Publication ready plots are generated in 1min Source code documentation example datasets input templates and reproducibility instructions are available at httpsgithubcomBelloyLabLocusBlend LocusBlend is implemented in Python using Streamlit Plotly and PLINK Supplementary Information Supplementary data are available online

18

The Registry of Pregnant Women at Cruces University Hospital: an ethical framework for prospective research with preanalytical optimization of maternal plasma processing

Gonzalez-Moro, I.; Sanchez-Garcia, H.; Medina Cuesta, T.; Rodriguez Lirio, A.; Espin Lopez, M. d. P.; Esquivel Gonzalez, S.; Quintana Ochoa de Alda, E.; de la Pena-Sanz, M.; Marin Cano, L.; Sarasua-Blanco, N.; Ortiz Salinas, P.; Sanfeliu Padulles, A.; Ruiz Adrian, A.; Martinez Isidoro, A.; Aldaiturriaga Otaola, A.; Aramburu Gil, A.; Garcia Gil, A.; Saenz Saenz, A.; Heredia Campos, A.; Fernandez Salado, A.; Ramirez Jarana, A. I.; Tobar Lopez, A. I.; Casarojos Oses, A. J.; Martinez de Maranon Toral, A.; Satiago Hidalgo, A.; Silva Diaz, A.; Basterrechea Miguel, A.; Castanos Lasa, A.; Esteras Vadi

2026-07-17 obstetrics and gynecology 10.64898/2026.07.17.26357942 medRxiv

Top 4%

0.1%

Show abstract

Background: Prospective pregnancy registries and biobanking infrastructures are essential for future translational studies investigating maternal, placental and offspring health. However, circulating nucleic acid analyses are highly sensitive to preanalytical variability, particularly regarding blood-collection tube type and sample processing conditions. We established a prospective pregnancy registry and biobanking workflow at Cruces University Hospital and evaluated the impact of preanalytical variables on circulating cell-free DNA (cfDNA) and cell-free RNA (cfRNA) preservation in maternal plasma collected at delivery. Methods: The Registry of Pregnant Women at Cruces University Hospital was designed as a prospective infrastructure integrating placental sampling, maternal blood collection and ethically controlled future access to maternal and offspring clinical data. Within this framework, peripheral blood samples from 50 women at delivery were simultaneously collected into EDTA, Norgen and Roche tubes. Plasma samples processed within or after 24 hours following collection underwent cfDNA/cfRNA extraction, electrophoretic profiling, fluorometric quantification and RT-qPCR analyses targeting different stress-related genes. Results: By the end of June 2026, 1,127 women had been prospectively recruited into the registry, with 661 plasma samples, 637 serum samples and 858 sets of four placental biopsies collected, processed and stored in the Basque Biobank. In the preanalytical substudy, EDTA tubes yielded higher cfDNA concentrations, likely reflecting reduced cellular preservation and genomic DNA contamination. In contrast, Roche tubes showed superior cfRNA preservation, with higher cfRNA concentrations and more consistent detection of the characteristic 5S rRNA peak compared with EDTA and Norgen tubes. Processing delays beyond 24 hours reduced cfRNA concentration, while associations between circulating transcripts and gestational age were more consistently detectable in preservative-containing tubes. Conclusions: Prospective infrastructures like ours offer strong foundation for large scale, long-term studies in the framework of the Developmental Origins of Health and Disease hypothesis. Technically, Roche tubes provided superior cfRNA preservation and enhanced sensitivity for detecting subtle biological associations, supporting the importance of standardized preanalytical workflows within prospective pregnancy biobanking resource.

19

From Menarche to Menopause: Hormonal Influences on Functional Neurological Disorder

Palmer, D. D. G.; Warren, N.; Morton, A.; Lehn, A.

2026-07-18 neurology 10.64898/2026.07.16.26358260 medRxiv

Top 4%

0.1%

Show abstract

Background Functional neurological disorder (FND), one of the most common neurological conditions, affects women almost twice as frequently as men. The reasons for this are unknown, and there has been minimal research into how physiological and pathological features of women's health interact with symptoms of FND. Methods We conducted an online survey assessing the effect of several aspects of women's health with the severity of symptoms of FND. Results 484 people completed the survey. Among the 223 who had regular or fairly regular menstrual cycles, a strong difference across the menstrual cycle was seen, with symptoms at their best in the follicular phase, worsening in the luteal phase, and worst in the pre-menstrual period and the menses. This effect was not moderated by a proxy measure of pre-menstrual dysphoric disorder (PMDD). Participants who were taking the combined oral contraceptive (COC, n=43) and progesterone-based contraception (n=80) were more likely to report symptom improvement from starting the medication than worsening. When compared to menstruating participants who were not taking the COC, participants taking the COC reported less worsening in their symptoms of FND in the luteal, pre-menstrual, and menstrual phases. Of the 99 women who had passed menopause since developing FND, 76% reported worsening of their FND symptoms after menopause. Discussion This study demonstrates interactions between several aspects of women's health and symptoms of FND. The observed pattern of symptom fluctuation across hormonal states suggests a potential modulatory role of oestrogen, warranting further targeted investigation.

20

Benchmarking Speech Recognition Models for Medical Consultations in Latin American Spanish: A Comparative Evaluation with Fine-Tuning

Carrillo, R. M.; Carbajal Serrano, A.; Condori Pinedo, P. S.

2026-07-16 public and global health 10.64898/2026.07.14.26358062 medRxiv

Top 4%

0.1%

Show abstract

BACKGROUND: Artificial intelligence (AI) medical scribes rely on speech-to-text (STT) models for transcription. Evaluations of STT models in non-English settings remain scarce. We benchmarked ten STT models on medical consultations from Latin American (LatAm) Spanish and assessed whether fine-tuning improves transcription accuracy. METHODS: Ten YouTube videos depicting medical consultations. Human transcriptions were the ground truth. Five open-source models were evaluated: Whisper Large, Whisper Large v3, Whisper Large v3 Turbo, Voxtral Mini 3B, and Canary 1B v2; and so were five close-source models: gpt-4o-transcribe, gpt-4o-mini-transcribe, gemini-2.5-pro, Eleven Labs, and Assembly AI. Whisper Large v3 was fine-tuned. One video was withheld from training. Performance assessed using Word Error Rate (WER), Character Error Rate (CER), BLEU Score, ROUGE-L, BERT Score, and Semantic Similarity on the one withheld video. RESULTS: None of the fine-tuning iterations outperformed the vanilla Whisper Large v3. With the withheld video, Gemini-2.5-pro was the close-source model with the best performance in four of six metrics. In comparison to the close-source models, the fine-tuned model never outperformed the other models (withheld video); conversely, in comparison to the close-source models, the fine-tuned model showed better performance across metrics, for instance: BLEU score (63% vs to 58% for the second-ranking model), BERT (89% vs to 86%), and semantic similarity (89% vs to 83%), CER (19% vs 20%). CONCLUSIONS: Whisper Large v3 and its fine-tuned variant are the best open-source STT models for transcribing medical conversations in LatAm Spanish. These findings provide an evidence base for developing AI medical scribes tailored to Spanish-speaking LatAm.