iScience — Latest Matching Preprints

1

General-purpose large language models can achieve physician-level accuracy in complex medical data extraction

Rajeev, M.; Narayan, A.

2026-06-10 gastroenterology 10.64898/2026.06.06.26354838 medRxiv

Top 1%

6.4%

Show abstract

Background: Unstructured data represent about 80% of total electronic health records (EHR) data. Structuring this free text is essential for advancing clinical research, including cohort selection for trials, retrospective studies, and the development of disease registries. While manual chart review (MCR) remains the gold standard for extracting this clinical data, the process is inherently slow, resource-intensive, and susceptible to errors from human fatigue. We evaluated the extraction accuracy, safety, and efficiency of the HeLIX (Hepatology Logic-Integrated Extraction) framework, a Large Language Model (LLM) protocol using Google Gemini 3 Pro, compared to a gold-standard Manual Chart Review (MCR). Methods: A prospective validation study was conducted using 50 high-complexity, simulated hepatology discharge summaries designed to replicate the real-world heterogeneity of EHRs. The HeLIX framework employed a Zero-Shot, Structured Chain-of-Thought (CoT) prompting strategy enforced by a three-layer architecture: Clinical Reasoning Trace, Schema Enforcement, and Evidence Verification. The model extracted 45 distinct clinical variables. Performance was benchmarked against a consensus MCR. Results: Across 2,250 evaluated data points, the model achieved an overall Extraction Accuracy of 99.24% (95% CI: 98.8%-99.5%), with perfect concordance in 35/45 (77.8%) variables. For binary diagnostic variables, the model demonstrated an overall F1-score of 0.98, Recall of 0.99 and substantial inter-rater reliability (Cohens {kappa} = 0.97). Hallucinations were exceptionally rare (2/2250; 0.08%). Critical errors affecting clinical management occurred in only 2 instances (<0.1% of total data), both involving etiological misattribution in complex multifactorial diagnoses. The AI workflow was 13.4-fold faster and 95.1% more cost-effective than manual extraction. Conclusion: The HeLIX framework demonstrates physician-level accuracy and reliability in extracting complex hepatology data. It offers a scalable, efficient, and economical alternative to manual chart review. Such frameworks could accelerate clinical research, enabling healthcare systems globally to build comprehensive patient registries for a fraction of the traditional cost.

2

Heart Rate Circadian Oscillations as Digital Biomarkers of Cardiometabolic Health Determinants

Colitta, A.; Bruno, S.; Benedetti, D.; Hoxhaj, D.; Cruz-Sanabria, F.; Di Pede, C.; Buracchi Torresi, F.; Frumento, P.; Gargani, L.; Fabbrini, M.; Maestri Tassoni, M.; Bonanni, E.; Faraguna, U.

2026-06-10 cardiovascular medicine 10.64898/2026.06.07.26355124 medRxiv

Top 2%

4.9%

Show abstract

AIMS Cardiometabolic risk factors may impair health by altering the autonomic modulation of the cardiovascular system, a physiological process described by heart rate (HR) circadian oscillations. However, the impact of cardiometabolic health determinants on HR circadian oscillations remains scarcely characterized in real-world, population-based settings. To address this, we applied digital health technologies to investigate how cardiometabolic health determinants shape HR circadian oscillations in a real-world cohort of individuals free of cardiometabolic diseases. METHODS First, a 10-fold cross-validation of a model was performed, aiming at mitigating wearables measurement error caused by motion artifacts. This process was informed by 10,056 epochs of concurrent wearable-derived and polysomnographic HR assessment, yielding an average 1.3 bpm reduction in wearables measurement error. We subsequently applied this model to over 2 million 1-minute epochs of HR data, derived from 7-day continuous actigraphic recordings of 245 individuals free of cardiometabolic disorders. Functional-on-scalar regression modelling and both parametric and nonparametric analyses characterized HR circadian profiles and their relationships with demographics, lifestyle, chronotype, sleep health, and chronic insomnia diagnosis. A 6-dimension sleep health index was calculated. RESULTS Sex, chronotype, and sleep health predominantly shaped HR circadian oscillations. In detail, females consistently showed higher HR across the 24 hours. Moreover, chronotype was associated to a phase shift in HR circadian profiles, with later timings corresponding to eveningness. Notably, sleep health impacted HR circadian oscillations in a dose-dependent fashion: each additional impaired sleep dimension was associated with a 1.2 bpm HR increase during nighttime, alongside reduced circadian robustness and delayed oscillation timings. Finally, the earlier occurrence of morning HR peaks served as a digital biomarker of insomnia (80% specificity, 74% sensitivity). CONCLUSIONS This work provides a digital health framework to characterize HR circadian oscillations in free-living populations and supports its clinical utility in capturing the autonomic disruptions related to cardiometabolic health determinants.

3

STDP-inspired temporal transition modeling for adaptive clinical risk prediction from electronic health records

Gong, L.; Aswani, N.; Shahinian, P.; Yang, J. Y.; Kontos, D.; Manji, G.; Kang, S.; Hur, C.

2026-06-09 health policy 10.64898/2026.06.04.26354919 medRxiv

Top 2%

4.9%

Show abstract

Electronic health record (EHR) prediction models often summarize longitudinal histories as static patient-level features, which may omit potentially informative event ordering. We developed a simplified spike-timing-dependent plasticity (STDP)-inspired framework that represents asynchronous EHR data as sparse, directional transition features. The approach encodes whether one clinical event precedes another within prespecified temporal windows, preserving event identity, directionality, and approximate timing while retaining feature-level interpretability. We evaluated this framework in two retrospective prediction tasks with different temporal scales: incident acute kidney injury (AKI) prediction in 17,351 MIMIC-IV ICU stays and early postoperative recurrence prediction in 713 CUMC patients with pancreatic ductal adenocarcinoma (PDAC). Models were compared with static burden features (demographics, comorbidities, raw lab measurements) and in addition with STDP transitional feature sets using patient-level cross-validation and rolling prediction horizons. In AKI, a calibrated STDP ensemble model showed higher discrimination than static burden alone at the 24-hour decision snapshot for AKI by 72 hours, with AUROC 0.838 versus 0.800, and at 48 hours for near-term AKI prediction, with AUROC 0.868 versus 0.827. In PDAC, STDP transition features modestly improved Day -30 preoperative recurrence prediction, with AUROC 0.611 versus 0.587 and AUPRC 0.323 versus 0.318 for static burden and showed similar performance at Day 0 (7 days before recorded surgery date), with AUROC 0.681 and AUPRC 0.363. Decision-curve and feature analyses suggested that selected temporal transitions were clinically interpretable across renal, inflammatory, hepatobiliary, hematologic, glycemic, and nutritional trajectories. These findings suggest that STDP-inspired transition features may provide a practical, interpretable way to incorporate temporal ordering into EHR-based risk prediction across both acute and longitudinal settings

4

A risk-of-contagion index using a Bayesian based model for the COVID-19 epidemic in Mexico

Corona-Moreno, R.; Acuna-Zegarra, M. A.; Santana-Cibrian, M.; Velasco-Hernandez, J. X.

2026-06-10 health policy 10.64898/2026.06.09.26355274 medRxiv

Top 2%

4.5%

Show abstract

During the COVID-19 pandemic, limited testing capacity and reporting delays complicated epidemic surveillance and decision-making in Mexico. We calibrated \textit{covidestim}, a Bayesian nowcasting model, to estimate the total SARS-CoV-2 infections from reported cases and deaths using Mexican surveillance data. Disease-progression distribution priors were calibrated using Mexico City records and validated through comparisons with national seroprevalence surveys, hospitalization data, and annual reported severe-case rates across all states. Using the reconstructed estimates of active infections, we implemented an event-based risk framework that quantifies the probability of encountering at least one infectious individual in gatherings of different sizes. This probability was subsequently translated into a four-level epidemiological traffic-light indicator and computed at both state and municipality levels. The resulting estimates revealed substantial spatial heterogeneity that is obscured by state-level aggregation, particularly in states with marked differences between urban and rural municipalities. To evaluate consistency with public-health indicators, we compared the proposed risk classification with the official Mexican epidemiological traffic-light system, considering interpretable gathering sizes relevant to public-health decision making. Weekly reports derived from this framework were delivered to policymakers in the State of Queretaro in Mexico, as an anticipation tool for school reopening and public-space management. This demonstrates that this Bayesian reconstruction of infections combined with event-based risk metrics can provide an interpretable and generalizable municipality-level complement to routine surveillance systems, particularly in regions with limited testing capacity and heterogeneous local transmission dynamics.

5

PhysiCase: Development and dual-layer validation of synthetic cases for health professional education: A pilot study leveraging Generative AI

Komolafe, O. O.; Roberts, A. C.; Shelley, J.; Tawiah, A. K.

2026-06-09 rehabilitation medicine and physical therapy 10.64898/2026.06.07.26355114 medRxiv

Top 4%

3.7%

Show abstract

High-quality, domain-specific datasets are foundational to advancing educational tools and AI systems in healthcare, yet assembling case repositories from real-world clinical records faces substantial privacy, ethical, and licensing barriers. Synthetic data generation offers a compelling pathway forward, but educational cases require rigorous validation to ensure clinical plausibility and pedagogical utility. This pilot study introduces PhysiCase, a dual-layer validation pipeline for synthetic case generation and evaluates the feasibility of combining automated LLM-based screening with expert educator review. We generated 128 synthetic musculoskeletal(MSK) cases using four frontier large language models (GPT-4.1, GPT-4o, Google Gemini 2.5 Pro, and Llama 4 Scout) across 28 clinical conditions. Cases underwent automated quality screening using an "LLM-as-judge" framework (DeepEval) assessing prompt alignment, JSON correctness, answer relevance, bias, toxicity, and completeness. Ninety cases (70.3%) passed automated filtering and proceeded to expert evaluation by four MSK physiotherapy educators, who rated medical accuracy, realism, fidelity, relevance, and usability on 5-point Likert scales. GPT-4.1 demonstrated the highest automated pass rate (96\%) and strongest expert ratings (medical accuracy 4.10/5, usability 4.38/5), while Llama 4 Scout showed the lowest pass rate (33.3%) and expert ratings. Expert-evaluated cases achieved strong content validity indices for usability (97.5%), relevance (97.5%), and realism (95%), though medical accuracy showed greater variance (CVI 87.5%). Cross-layer correlation analysis revealed that automated completeness metrics moderately aligned with expert usability ratings , while answer relevance and prompt alignment showed weak or negative correlations with clinical correctness. Qualitative analysis identified three primary failure modes: reductive logic, biomechanical inconsistency, and administrative/contextual gaps. The dual-layer validation framework proved methodologically viable: automated screening efficiently reduced expert review burden, while human judgment remained indispensable for detecting subtle clinical reasoning failures. LLM-generated synthetic cases has the potential to meet practical educational needs for MSK physiotherapy, but expert validation is essential to safeguard clinical accuracy. These findings support a scalable division of labour for synthetic case development, with targeted improvements to prompting and automated reasoning checks needed to address identified "nuance gaps." The code for this paper is available on https://github.com/kwid-ai/PhysiCase

6

A mechanistic model for genetic regulation of postmenopausal bone loss

Rattsev, I.; Mac Gabhann, F.; Hertz, D.; Taylor, C. O.

2026-06-08 endocrinology 10.64898/2026.06.04.26354968 medRxiv

Top 7%

2.8%

Show abstract

Bone remodeling is a tightly regulated physiological process that maintains bone health through coordinated action of bone-resorbing osteoclasts and bone-forming osteoblasts. Disruption of this balance, such as the one induced by estrogen decline after menopause, results in bone loss and osteoporosis. Genetic factors play an important role in determining bone mineral density (BMD) loss over time. However, translating genetic associations into individualized risk prediction remains challenging due to small effect size of individuals variants and non-linear interactions within the bone remodeling unit. Here, we present a bone cell population dynamics model that includes major regulatory pathways, such as the RANK/RANKL/OPG axis, Wnt signaling, and hormonal regulation by estrogen, parathyroid hormone, and TGF-{beta}. We calibrate the model on clinical data from healthy postmenopausal women, and women with reduced BMD undergoing anti-osteoporotic therapy. The calibrated model captures healthy BMD decline in postmenopausal women and therapeutic response to anti-osteoporotic medications. We mechanistically incorporate the effect of 22 variants across 8 genes involved in bone remodeling and simulate BMD trajectories in 1,000 virtual subjects differing by ancestry and genetic makeup. The median predicted 5-year BMD loss was 3.57% (95% prediction interval: 1.31-5.24), consistent with the values reported in the literature. The virtual individuals with African ancestry were predicted to experience the highest average 5-year BMD loss. The strongest genetic risk factors for bone loss were predicted to be CYP19A1 rs727479 and OPG rs3102735, while LRP5 rs11228240 emerged as a protective factor that could partially counteract the detrimental effects of other variants. Several epistatic effects were observed in the genetic interaction analysis. Mechanistically, our model suggested that estrogen exerts its effect on bone remodeling primarily by modulating osteoclast apoptosis. Overall, this framework demonstrates a proof-of-concept for integration of genetic risk factors into mechanistic models of disease and can be extended to other conditions with polygenic inheritance.

7

A single-nucleus transcriptomic atlas of human basal ganglia during development forwarding diagnosis and therapy of pediatric movement disorders

Lange, B. K. A.; Graceffo, E.; Stenzel, W.; Biebermann, H.; Schuelke, M.; Wilpert, N.-M.

2026-06-04 nephrology 10.64898/2026.06.04.26354648 medRxiv

Top 8%

2.4%

Show abstract

Gene therapy is rapidly emerging as a transformative treatment for monogenic neurological disorders, including pediatric movement disorders such as aromatic L-amino acid decarboxylase (AADC) deficiency. However, its success critically depends on defining target cells and windows for therapeutic intervention. Here, we present an open-access single-nucleus transcriptomic atlas of the human basal ganglia spanning a therapy-relevant window from second/third trimester to the perinatal period and adulthood. Across 35,755 nuclei, we identify major (non-)neuronal cell types, retrace developmental trajectories, and characterize gene-regulatory networks. We identify so far unrecognized human-specific expression of key neuronal signaling genes, including GNAO1 and ADCY5, and discuss the implications for targeted gene replacement therapies. Unexpectedly, we found that the Huntingtin gene (HTT) is already expressed during prenatal stages of human brain development, supporting a previously proposed neurodevelopmental component of Huntington's disease, which should be considered in diagnostic and therapeutic strategies. Moreover, FOXG1 expression and regulon activity are predominantly located in a prenatal time window, suggesting constraints on the effectiveness of postnatal interventions. Our findings highlight the importance of datasets capturing human brain development in real time and provide a publicly available resource to guide precision gene therapy strategies in the future.

8

Prioritizing embryos with lower homozygosity may reduce disease risk in children of related individuals undergoing preimplantation genetic testing

Wolfram, T.; Ahangari, M.; Davidson, I.; Wartschinski, L.; Li, J. H.; Eyre, M.; Stern, D.; Schleede, J.; Haghighi, A.; Carmi, S.; Christensen, M.

2026-06-04 genetic and genomic medicine 10.64898/2026.05.30.26354526 medRxiv

Top 9%

2.1%

Show abstract

Consanguinity is a reproductive union between individuals who share a recent common ancestor. These unions are common in many regions of the world and increase the burden of rare recessive disorders by elevating autozygosity in offspring. Current reproductive genetic screening focuses on a limited set of known pathogenic variants, leaving most recessive risk unaddressed. Here we argue that embryo-level autozygosity, quantified as the fraction of the genome in long runs of homozygosity (FROH), is a potentially actionable genomic biomarker that can be integrated into routine preimplantation genetic testing as a homozygosity-informed embryo-prioritization framework (PGT-H) that can be layered onto existing embryo biopsy workflows when couples are already undergoing IVF with PGT-A or PGT-M. Using forward simulations of first-cousin and double-first-cousin couples, we show that siblings conceived by the same couple span a wide range of FROH; selecting the lowest-FROH candidate from a cohort of five embryos reduces FROH by approximately 40% on average. Combining these reductions with empirical effect-size estimates, we estimate that for first-cousin couples this strategy could reduce risk of intellectual disability by roughly 35-45% (corresponding to an absolute risk reduction of about 1.8-2.2%) and potentially reduce excess recessive disease burden, while also modestly reducing risk of common diseases such as type 2 diabetes. We outline how existing PGT-A and PGT-M workflows could potentially be extended to report embryo-level FROH and discuss ethical and counseling considerations. Autozygosity-based embryo prioritization offers a principled way to address a component of recessive risk that current variant-centric approaches miss.

9

Development and Prospective Validation of Predictive Model for Early Hemodynamic Deterioration in Critical Care: A Multicenter Study

Nagori, A.; Singh, P.; Firdos, S.; Devadiga, A.; Vats, V.; Gupta, A.; Bandhey, H.; Ailavadi, P.; Awasthi, R.; Narotam, N.; Mishra, A.; Lodha, R.; Sethi, T.

2026-06-10 intensive care and critical care medicine 10.64898/2026.06.05.26353765 medRxiv

Top 10%

2.1%

Show abstract

High-frequency physiological monitoring in ICUs can identify impending deterioration hours before clinical recognition yet extracting reliable early-warning signals from noisy vital-sign streams remains challenging. We present SIgnose, an interpretable prediction framework for early detection of abnormal shock index (SI), built from routinely monitored vital signs using physiologic variability and nonlinear time-series features. SIgnose was developed on the eICU Collaborative Research Database and externally validated on the MIMIC-III adult database and a pediatric SafeICU cohort (AIIMS New Delhi), with additional prospective validation in the pediatric ICU. We benchmarked three representation strategies: (i) engineered physiologic variability and nonlinear time-series features, (ii) deep learning, and (iii) Llama-3.1-8B embeddings with low-rank adaptation. Physiologic variability features consistently demonstrated superior cross-cohort generalization. The final model used 3,970 features from five vital signs to predict abnormal SI up to 8 hours ahead, achieving AUROC 0.861 (95% CI 0.859-0.863) and AUPRC 0.927 (95% CI 0.925-0.929) on eICU. External validation yielded AUROC 0.870 (95% CI 0.863-0.876) and AUPRC 0.935 (95% CI 0.930-0.940) on MIMIC-III, and AUROC 0.875 (95% CI 0.863-0.888) and AUPRC 0.915 (95% CI 0.898-0.930) on SafeICU; prospective pediatric validation (n = 88) achieved AUROC 0.885 (95% CI 0.868-0.902) and AUPRC 0.911 (95% CI 0.882-0.936). SHAP interpretability analysis identified heart rate variability, respiratory trend dynamics, and multi-scale blood pressure variability as key early-warning signatures. These findings establish SIgnose as a reproducible, low-compute, early-warning framework and demonstrate that physiologic variability features provide robust, generalizable representations for early deterioration detection across adult and pediatric critical care.

10

Modeling cycle phases using hormone trajectories in women with and without polyendocrine metabolic ovarian syndrome

Stujenske, T. M.; Bouchard, T. P.; Troy, A.; Kelemen, S.; Folino, B.; Wills, T.; Sugden, L. A.

2026-06-04 obstetrics and gynecology 10.64898/2026.06.02.26354701 medRxiv

Top 11%

1.9%

Show abstract

The recent availability of at-home menstrual cycle tracking technology has created opportunities for personalized assessment of reproductive health, alongside improved characterization of hormone patterns in women with and without reproductive disorders such as polyendocrine metabolic ovarian syndrome (PMOS), which affects approximately 10% of reproductive-age women. In this study, we leverage self-tracked urinary hormone data to develop an autoregressive Hidden Markov model (arHMM) that maps cycle days to physiologically meaningful phases based on hormone trajectories. By modeling day-to-day hormonal dynamics rather than absolute hormone levels, and allowing variable phase durations, this approach accommodates substantial variability in menstrual cycles, thereby enabling meaningful comparisons within and between individuals. Across more than 3800 cycles from over 1100 individuals, we find that arHMM-derived phases reproduce expected hormonal patterns within follicular, periovulatory, and luteal phases, and that phase-based timing for hormone testing outperforms conventional cycle day-based testing in capturing the luteinizing hormone surge and post-ovulatory progesterone rise, highlighting limitations of fixed-day clinical protocols. We identify phase-specific differences between healthy controls and individuals with self-reported PMOS, including lower luteinizing hormone in the periovulatory phase, and reduced luteal-phase progesterone levels in PMOS. Furthermore, features derived from arHMM phase assignments enable classification of PMOS status with ~78% accuracy, demonstrating the potential of this approach for non-invasive PMOS screening.

11

Understanding Human AI Discrepancy in Breast Cancer TIL Assessment: A Multi-Rater and Perceptual Bias Study

Capar, A.; Aloglu, I.; Aker, F.; Ertano, M.; Mese, Y. E.; Ungor, A.; Yildiz, B. E.

2026-06-04 pathology 10.64898/2026.05.29.26354196 medRxiv

Top 13%

1.8%

Show abstract

Objective: Tumor-infiltrating lymphocytes (TILs) in breast cancer are one of the most important indicators of the immune response within the tumor microenvironment. They play a particularly significant prognostic and predictive role in triple-negative and HER2-positive subtypes. However, substantial inter-observer variability has been reported in TIL scoring among pathologists, which limits its reliability in clinical practice. The aim of this study was to evaluate the agreement between artificial intelligence (AI) models and pathologists in TIL scoring and to compare this agreement using different statistical approaches, thereby assessing the potential of AI integration into pathology practice. Materials and Methods: Digitized histopathological images of breast cancer cases were included in the study. Tumor regions annotated by pathologists were evaluated for both stromal TIL percentage and the proportion of stromal tumor area within each ROI, with assessments performed independently by three pathologists and two AI models. Agreement was assessed among pathologists, between pathologists and AI, and between AI models. Statistical analyses included intraclass correlation coefficient (ICC), Cohen and Fleiss kappa, correlation tests, and Bland-Altman analysis. In addition, categorical agreement was examined using different cut-off values. Results: Inter-pathologist agreement was high, with an ICC of 0.81. In contrast, the global agreement between pathologists and AI models was lower (ICC 0.41). Pairwise comparisons of pathologist-AI agreement yielded substantially lower ICC values (0.12-0.21), although this improved to 0.53 when three pathologists were assessed jointly with a single AI model. The strongest categorical agreement was observed with dichotomized TIL scores ([≤]10% vs. >10%), whereas multi-category classifications were associated with a marked reduction in kappa values. Spearman correlation coefficients between pathologists and AI models ranged from moderate to good ({rho} = 0.48-0.81). Agreement between the two AI models themselves was moderate, with an ICC of 0.64

12

Oxygen-based endotypes of Obstructive Sleep Apnea

Wellman, A.; Messineo, L.; Azarbarzin, A.; Esmaeili, N.; Aishah, A.; Vena, D.; Sumner, J.; White, D.; Sands, S.

2026-06-04 respiratory medicine 10.64898/2026.06.03.26354835 medRxiv

Top 13%

1.8%

Show abstract

Objective: Several endotypes contribute to the development of Obstructive Sleep Apnea (OSA). However, efforts to measure these endotypes have been challenging. In this paper, we propose a new method that overcomes some of these challenges. Methods: To test the feasibility of this new method, data from the Sleep Heart Health Study (SHHS) were analyzed and two oxygen-based endotypes were identified and plotted on a graphical model: the steady-state SpO2 and the SpO2 arousal threshold. The first is the oxygen saturation that would occur during sleep if there were no arousals, and it is a measure of upper airway collapsibility (a more collapsible airway produces a lower SpO2). The latter is the oxygen saturation that triggers arousals. These endotypes were validated by assessing their ability to detect positional and state-related changes in airway collapsibility and arousal threshold. Results: The study showed that it was feasible to measure oxygen-based endotypes in 95% of SHHS participants. As expected, steady-state SpO2 was lower during supine vs. non-supine sleep, as well as during REM vs. NREM sleep. Also, the SpO2 arousal threshold was similar between supine and non-supine sleep. However, SpO2 arousal threshold was not lower in REM sleep vs. NREM sleep. Therefore, in 3 of the 4 conditions, the oxygen-based endotypes moved in the expected direction due to positional or sleep state changes. Conclusion: Although further validation experiments are required, this study indicates that OSA endotyping using the pulse oximetry signal is feasible. The oxygen-based endotypes could be used to aid therapeutic decision making.

13

Daily symptom monitoring is sustainable over months: retention, not compliance, is the primary barrier to long-duration digital tracking

Gunsilius, C. Z.; Pei, P.; Carayannopoulos, A.; Petzschner, F. H.

2026-06-10 rehabilitation medicine and physical therapy 10.64898/2026.06.08.26355180 medRxiv

Top 13%

1.8%

Show abstract

Ecological momentary assessment (EMA) enables real-time, longitudinal measurement of symptoms and behavior via smartphones, yet nearly all feasibility evidence comes from protocols lasting one to two weeks, far shorter than the timescales over which chronic diseases fluctuate and clinical decisions unfold. Whether daily compliance can be sustained over months, or whether it decays as short-protocol trends predict, is unknown. Here, 214 participants (173 with pain, 41 healthy controls) completed a 4-month (122-day) EMA protocol via the Soma smartphone app, generating 26,907 check-ins. Half the sample completed the full protocol without a two-week lapse. Aggregate compliance appeared moderate (50%), but this conflated two distinct phenomena: when recomputed over each participant's active period, compliance rose to 71%, with 91% achieving moderate-to-high adherence, and remained stable across all 17 study weeks. Pain status predicted earlier disengagement but not lower compliance among those who remained; after adjustment for differential retention, group differences disappeared. To our knowledge, this is the longest continuous daily EMA evaluation in a clinical population. It suggests the primary barrier to long-duration EMA is not declining motivation among active participants but concentrated early disengagement, with direct implications for the design of digital health protocols, decentralized trials, and remote symptom monitoring.

14

Exploratory Assessment of Pulsed-Wave Doppler Representations of Lung Sounds Using Deep Learning: An In-Vitro Phantom Study

Saad, A. A.; Murthi, S. B.; Boctor, E. M.; Teeter, W. A.; Seam, N.

2026-06-10 respiratory medicine 10.64898/2026.06.09.26353787 medRxiv

Top 14%

1.7%

Show abstract

The increasing availability of portable ultrasound systems motivates exploration of novel approaches to respiratory signal assessment. In this in-vitro study, we investigate whether pulsed-wave (PW) Doppler ultrasound can capture structured spectral patterns from replayed lung sound recordings. Digitized respiratory sounds were replayed through a tissue-mimicking ultrasound phantom, generating 1,478 PW Doppler spectral images from recordings associated with healthy subjects and several externally labeled disease categories. Exploratory classification experiments using a ResNet-18 architecture demonstrated that these Doppler representations contain learnable differences under controlled conditions. These findings motivate further investigation into PW Doppler as a potential representation of respiratory acoustics.

15

Formalising Limits of Circulating Tumour DNA Detection: A Signal Detection Framework for Clinical Threshold Specification

Walinjkar, A.

2026-06-10 oncology 10.64898/2026.06.08.26355204 medRxiv

Top 16%

1.6%

Show abstract

Background: Circulating tumour DNA (ctDNA) liquid biopsy is now established across oncology for early cancer detection, minimal residual disease surveillance, and treatment monitoring. Detection thresholds for all current ctDNA assays are derived empirically through receiver operating characteristic analysis on training cohorts - a statistically valid but theoretically uninformed approach that does not specify the minimum detectable tumour fraction given assay technical characteristics, nor identify when increasing sequencing depth ceases to provide additional clinical information. Methods: We model ctDNA detection as a binary hypothesis testing problem with Binomial-distributed mutant allele counts against a sequencing error noise floor. The Neyman-Pearson lemma is applied to derive the uniformly most powerful detector and the minimum detectable tumour fraction in closed form. The sequencing assay is modelled as a binary symmetric channel and Shannon channel capacity is calculated. Empirical validation uses n=61 data points extracted from five published peer-reviewed analytical validation studies across five independent institutions in the US and EU (2018 - 2025): Yu et al. 2022, Stetson et al. 2018, Frydendahl et al. 2023, Northcott et al. 2024, and Cheng et al. 2025. Results: The minimum detectable tumour fraction is derived in closed form as f_min approximately equal to (z_alpha + z_beta) multiplied by the square root of (epsilon divided by N), where N is sequencing depth, epsilon is the platform error rate, and z_alpha, z_beta are standard normal quantiles at the specified false positive and false negative rates. Shannon channel capacity is C = 1 minus H(epsilon) bits per read, where H(epsilon) is binary entropy. Empirical validation yields 84.3% agreement for single-locus assays. Discordance for multi-locus tumour-informed assays (NeXT Personal, duplex WGS) is consistent with the single-locus model scope and identifies the principal theoretical extension required. Conclusions: This framework provides the first formal Neyman-Pearson optimality proof for ctDNA detection, a closed-form detection limit, and a platform-independent efficiency metric for NHS and regulatory standardisation. Keywords: circulating tumour DNA; liquid biopsy; Neyman-Pearson detection; Shannon channel capacity; sequencing depth; limit of detection; minimal residual disease; signal detection theory

16

Global practices in paediatric olfactory dysfunction: a cross-sectional survey of paediatric ENT surgeons

Spencer, G. M.; Karim, K.; Dzioba, A.; Graham, M. E.; You, P.; Hummel, T.; Gellrich, J.; Coyle, P.; Burns, H.; Peer, S.; Zawawi, F.; Lechien, J. R.; Schriever, V. A.; Bhargava, E. K.; Whitcroft, K. L.

2026-06-06 otolaryngology 10.64898/2026.06.04.26354942 medRxiv

Top 17%

1.6%

Show abstract

Background: Olfactory dysfunction (OD) in children remains underdiagnosed and poorly characterised. Despite its known impacts on nutrition, quality of life, safety awareness, and psychosocial development, no standardised diagnostic or management pathway currently exists for paediatric OD. This study aimed to characterise global practice patterns and identify diagnostic and therapeutic challenges unique to paediatric care. Methodology/Principal: A 44-item cross-sectional online survey was distributed to a verified international network of paediatric otolaryngologists across 36 countries via a closed professional platform. The survey assessed five domains: diagnostic practices, management protocols, technology and innovation, education and training, and barriers to effective care. Regional grouping was used to facilitate meaningful statistical comparisons. Categorical variables were evaluated using chi-square tests, with odds ratios and 95% confidence intervals reported for significant findings. Results: Of 351 potential participants, 167 responded (47.6% response rate). Most respondents (83%) reported seeing children with OD, yet 95% saw fewer than ten such patients annually. Psychophysical testing was never performed by 54.8% of respondents, while 88.4% routinely ordered cross-sectional imaging. Testing frequency increased significantly with patient age (Cochran's Q p<0.001). The most common barriers to objective testing were insufficient training (44.3%), time constraints (29.9%), and funding limitations (28.1%). Multidisciplinary collaboration was negligible. Significant regional variation was observed across most practice domains. Conclusions: Paediatric OD care is characterised by functional underinvestigation, fragmented multidisciplinary collaboration, and systemic educational gaps. These findings support urgent development of standardised clinical guidelines, age-appropriate validated assessment tools, and formal interdisciplinary care pathways.

17

Cross-Sectional Validation of an 8-Electrode Multi-Frequency Bioelectrical Impedance Analysis (BIA) Device Against Dual-Energy X-ray Absorptiometry (DEXA) for Body Composition Assessment in Indian Adults

Bheda, A.; Sharma, M.; Jokare, N.; Kapoor, S.; Chouksey, J.

2026-06-09 nutrition 10.64898/2026.05.24.26353564 medRxiv

Top 17%

1.5%

Show abstract

Background: Obesity is becoming a global health crisis, and it leads to various metabolic disorders. Body mass index fails to differentiate fat mass from lean mass and systematically misclassifies adiposity risk - a limitation particularly pronounced in South Asian adults, who exhibit characteristically elevated visceral adiposity and reduced appendicular lean mass at a normal BMI. The 2025 Lancet Commission explicitly recommends direct adiposity measurement beyond BMI for obesity diagnosis. Weight loss interventions - whether dietary, behavioural, or pharmacological - are consistently associated with concurrent reductions in both fat mass and lean mass, making body composition monitoring essential beyond scale weight alone. Although DEXA is globally accepted as a gold standard for body composition analysis, the accessibility of DEXA is limited, particularly in resource-constrained low and middle-income countries such as India. BIA devices are a convenient low-cost option to DEXA and can be used for body composition analysis more frequently than a DEXA scan to provide longitudinal data. The aim of this study is to validate 8 electrode BIA devices as a viable alternative to DEXA scan for the South Asian population. Methods: A prospective cross-sectional validation study was conducted following ethics committee approval, with a priori sample size estimation ( = 0.05, power = 80%). Fifty-eight healthy adults (n=58) underwent three BIA measurements and one DEXA scan each. To ensure statistical independence, the three BIA readings per participant were averaged, yielding 58 final measurements for validation. Body fat percentage, lean mass and fat mass were evaluated using Python with statistical analyses like Bland Altman analysis, Pearson correlation, ICC and regression analysis. Results: In this BIA vs DEXA study, the Pearson correlation was strong across all three outcomes (fat%: r = 0.97; fat mass: r = 0.98; lean mass: r = 0.96), with ICC (2,1) values of 0.94, 0.97, and 0.91 confirming excellent absolute agreement. Mean absolute error was 3.40% for fat percentage, 1.96 kg for fat mass, and 3.37 kg for lean mass. BIA systematically underestimated body fat percentage (bias -1.96%, 95% CI: -2.91% to -1.01%; LoA: -9.04% to +5.12%) and fat mass (bias -0.72 kg, 95% CI: -1.38 to -0.07 kg; LoA: -5.59 to +4.14 kg), while overestimating lean mass by +3.08 kg (95% CI: +2.34 to +3.82 kg; LoA: -2.46 to +8.62 kg). Conclusions: The 8-electrode BIA device shows clinically acceptable agreement with DEXA for body composition assessment in healthy Indian adults. It offers a radiation-free, cost-effective, accessible, and portable alternative to DEXA, making it suitable for longitudinal monitoring and trend detection. The device is particularly valuable for obesity screening and for tracking body composition changes during weight loss interventions at the population level, addressing the critical need for accessible body composition assessment in resource-limited settings.

18

Genosolver: Rare Disease Diagnosis through Holistic Integration of Unstructured Clinical Narratives Using Large Language and Reasoning Models

Islam, T.; Danner, M.; Ziad, Z.; Begemann, M.; Beijer, D.; Lischka, A.; Lausberg, E.; Mattern, L.; Suh, J.; Wittig, P.; Guezel, N.; Schlaich, E.; Karaivanova, R.; D'Augello, S.; Franken, L.; Ruedebusch, J.; Mueller, R.; Perchalla, E.; Zempel, H.; Haag, N.; Eggermann, K.; Eggermann, T.; Meyer, R.; Kraft, F.; Elbracht, M.; Kurth, I.; Krause, J.

2026-06-05 health informatics 10.64898/2026.06.04.26354845 medRxiv

Top 19%

1.3%

Show abstract

Background: Molecular medicine has made genetic diagnostics crucial for rare diseases, but the majority of patients remains without diagnosis even after state-of-the-art assessment. Standardized systems for integrating clinical features, such as the Human Phenotype Ontology (HPO), offer assistance, but are often insufficiently detailed and fail to capture crucial clinical parameters such as age at onset, longitudinal changes in symptoms, detailed characteristics of a clinical symptom, or the absence of a feature. Results: We present Genosolver an integrated workflow that utilizes machine learning to address this bottleneck. Using Large Language Models (LLMs) and Large Reasoning Models (LRMs) on unstructured clinical notes and electronic health care data, we generate a workflow that unifies phenotype extraction, generates differential diagnosis, and prioritizes genetic variants from genome data. We evaluated the performance on 233 previously genetically solved cases, where Genosolver ranked the causative gene first in 72% of cases and in 94% of cases in the top 10 gene list, outperforming the existing benchmarking tool Exomiser by 9%. Semi-automated reanalysis of 1,875 unsolved rare disease cases yielded an additional diagnostic rate of 1.7%. Incorporating rich, unstandardized clinical narratives substantially enhanced model performance beyond HPO-only inputs and demonstrated competitive results using data security compliant local models. Conclusion: Integrating unstandardized clinical data with local LLMs and reasoning offers a scalable, data-secure workflow that increases molecular diagnoses in rare diseases.

19

Quantifying Cancer Clinical Trial Eligibility Using Artificial Intelligence-Based Matching

Goel, K. P.; Myall, N. J.; Dickerson, J.; Caswell-Jin, J. L.; Johnson, T.; Worth, J. E.; Gensheimer, M. F.

2026-06-05 oncology 10.64898/2026.06.03.26354859 medRxiv

Top 21%

1.2%

Show abstract

PURPOSE: To develop and validate an artificial intelligence-enabled platform that converts unstructured cancer trial eligibility criteria into structured queries and quantifies trial eligibility across advanced/metastatic cancer trials. METHODS: We downloaded actively recruiting US interventional treatment trials for advanced/metastatic breast cancer, colon cancer, and non-small cell lung cancer from ClinicalTrials.gov. Medical oncologists created 24 synthetic patient vignettes. A large language model converted trial eligibility criteria into Structured Query Language (SQL) code and patient information into structured records, enabling automated matching. Cancer details and treatment history were considered, but not laboratory results or comorbidities. Validation included physician editing of generated eligibility code for 30 trials, and blinded physician eligibility assessment for five trials. We then evaluated how age, ECOG performance status, sex, and ZIP code affected the number of eligible trials. RESULTS: Of 833 candidate trials, 746 met inclusion criteria. In physician review of 30 trials, edits to generated SQL did not change any of 720 trial-patient eligibility determinations for 24 synthetic patients. In blinded validation across 120 trial-patient pairs, automated matching achieved 97% accuracy. Across synthetic patients, eligible trials ranged from 31 to 258 when there were no geographic restrictions. Eligibility decreased markedly with worse performance status and with geographic restriction (both p<0.001). Later-phase, randomized, and molecularly selective trials had fewer eligible patients. CONCLUSION: AI-based structuring of trial eligibility criteria can support accurate, scalable measurement of potential cancer trial eligibility. In this demonstration, performance status, geography, and age were major determinants of eligibility across the active metastatic trial landscape.

20

Distinct and shared genetics of kidney filtration function versus albuminuria revealed by multi-trait GWAS

de Hesselle, H. C.; Garben, B.-F.; Stark, K. J.; Warth, R.; Teumer, A.; Pattaro, C.; Heid, I. M.; Winkler, T. W.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.08.26355141 medRxiv

Top 21%

1.2%

Show abstract

Chronic kidney disease is characterized by decreased glomerular filtration rate (eGFR, estimated from serum creatinine or cystatin C) or increased urinary albumin-to-creatinine-ratio (UACR). Genome-wide association studies provided the genetic make-up of these traits, but their overlap remained largely unknown. Our multi-trait GWAS (N=1M) identified 812 signals and multi-trait fine-mapping sharpened the identification of likely causal variants. Of 333 signals classified for filtration function or albuminuria, only 11 overlapped. Their effects on eGFR and UACR were directionally concordant, dominated by eGFR and independent of HbA1c or mean arterial pressure. Mapped genes pinpointed mechanisms related to glomerular filtration area (SHROOM3, EPB41L5) and sodium-mediated intraglomerular pressure (NRBP1, DPEP1/CHMP1A). Genetics of fluid intake resulted in shadow effects on UACR without albumin leakage into urine. Our multi-trait approach sharpened the identification of likely causal genes for kidney traits, demonstrated largely distinct genetics for filtration function versus albuminuria, and provided new biological insights into the overlap.