GigaScience — Latest Matching Preprints

1

Identification of Persistent Radiomics Feature Co-occurrence Across Diverse Tissue Types and Individuals: A Network-Based Analysis of the RADAPT CT Atlas

Amiri, S.; Afshar, P.; Rohban, M. H.

2026-07-19 radiology and imaging 10.64898/2026.07.17.26358252 medRxiv

Top 0.6%

5.4%

Show abstract

Objectives. Radiomics pipelines extract hundreds of quantitative features that are widely known to be redundant, but the structure of this redundancy is usually treated as a per-dataset nuisance to be pruned away. We tested the alternative hypothesis that a substantial number of feature-feature correlations are universal: they persist across patients and across anatomically distinct structures because they reflect shared mathematical and image-statistical properties of how the image is summarised, rather than properties of the tissue being imaged. Materials and Methods. We re-analysed the publicly available Radiomics Atlas Dataset of normal Abdominal and Pelvic CT (RADAPT), restricting the analysis to the 526 non-contrast-enhanced examinations of the 531-subject atlas and to the 107 original (non-filtered) PyRadiomics features. The 53 segmented structures were grouped into four broad anatomical categories -- bones, muscles, vessels, and parenchymal organs. RADAPT is distributed as one Excel file per structure, with patients as rows and features as columns. Within each structure file we z-score-normalised every feature across patients, computed the absolute Spearman correlation matrix, and retained edges with |{rho}| [≥] {tau} for {tau} in {0.70, 0.80, 0.90}. We then intersected the edge sets across all structure files to obtain a "universal" correlation graph, in which an edge survives only if it exceeds the threshold in every structure (each estimated across the full patient sample). Stable feature communities were defined as the maximal cliques of this graph. Robustness to patient sampling was tested by repeating the entire pipeline on five independent random splits of each file into two patient halves (10 sub-cohorts per threshold), and the implementation was independently reproduced in R. Results. Despite the strictness of the global-intersection criterion, 34, 24, and 14 stable feature communities survived at {tau} = 0.70, 0.80, and 0.90 respectively, with the largest cliques containing six members at {tau} = 0.70 and {tau} = 0.80 and five members at {tau} = 0.90. The community structure was clearly interpretable: separate cliques captured (i) variance-like intensity dispersion, (ii) long-run / low-frequency (coarse) texture, (iii) high gray-level texture, (iv) low gray-level texture, (v) volume and surface shape, and (vi) local-homogeneity and energy/entropy duals. On random-half resampling the exact-match recovery rate of these communities was 81.5 %, 86.7 %, and 80.7 % across the three thresholds; departures from exact recovery were almost always a single boundary feature added or dropped, consistent with finite-sample fluctuation of near-threshold edges rather than structural instability. The R re-implementation reproduced the Python results exactly. Conclusion. A substantial portion of radiomics feature collinearity is universal across patients and tissues. We distinguish two layers within it: trivial near-algebraic duals that are universal by construction, and non-trivial cross-matrix-family communities that are the genuine empirical finding. Together they provide an interpretable, definition-grounded basis for aggressive dimensionality reduction, for retrospectively reconciling apparently different feature selections in the literature, and for moving radiomics pipelines toward organ-agnostic, more reproducible models. Clinical relevance statement. Selecting a single representative feature from each universal community shrinks the original-feature space by roughly an order of magnitude without sacrificing biologically distinct information. For example, the five variance-family members (first-order Variance, GLCM SumSquares, GLCM ClusterTendency, GLDM and GLRLM GrayLevelVariance) can be replaced by a single representative, removing redundant degrees of freedom that would otherwise inflate model variance; and labelling each retained feature by its community lets two studies that selected different variance-family names be recognised as having found the same signal, simplifying model development and improving cross-cohort generalisability in clinical CT workflows.

2

FoodScribe: an open-source semantic framework for nutrient estimation from free-text dietary records

Gouda, H.; Sala Climent, M.; Agongo, J.; Gaikwad, S. P.; Nattakom, A.; Zhao, H. N.; Xing, S.; Boland, B. S.; Holt, T.; Guma, M.; Dorrestein, P. C.

2026-07-17 nutrition 10.64898/2026.07.15.26358181 medRxiv

Top 1.0%

3.9%

Show abstract

Efficiently summarizing dietary records at scale remains a persistent bottleneck in nutritional epidemiology. We present FoodScribe, which translates free-text meal descriptions into quantitative nutrient profiles by combining ingredient parsing with nutrient retrieval by querying the USDA FoodData Central (FDC) database. Benchmarked using three LLM providers using Nutribench dataset, FoodScribe completed annotation of 3,807 meal descriptions in 2.5 hours, a task otherwise requiring substantial manual effort from trained nutritionists. FoodScribe achieved accuracy across macronutrient estimation (F1=0.79-0.89), with models performing better for protein than fat estimation. Application to a Mediterranean diet intervention cohort indicated dietary shifts consistent with the intervention pattern based on model-derived estimates. Integration with metabolomics data suggested that fiber and vegetable intake were positively associated with a fecal metabolite cluster.

3

ReCo: a self-configuring and self-extending agentic framework for biomedical research

Tzanis, E.; Klontzas, M. E.

2026-07-16 health informatics 10.64898/2026.07.14.26358025 medRxiv

Top 1%

2.8%

Show abstract

This study presents ReCo (Research Cosmos), a self-configuring and self-extending agentic research framework for the biomedical domain. ReCo is orchestrated by a large language model that interacts with native computing tools, bundled Model Context Protocol (MCP) servers, structured skills, persistent project memory, and a desktop interface. Its bundled MCP servers provide biomedical analysis capabilities while serving as implementation paradigms for integrating new computational and AI frameworks. Structured skills encode procedures for environment configuration and framework ingestion, enabling ReCo to inspect repositories, manuscripts, or local codebases; identify dependencies and execution patterns; create isolated runtime environments; design and implement MCP interfaces. Self-extension was evaluated using five heterogeneous systems: the Merlin computed tomography foundation model, MAISI-v2 medical image synthesis framework, asari liquid chromatography-mass spectrometry workflow, DosimeTron agentic radiation-dosimetry platform, and Orthanc DICOM server. ReCo successfully operationalized all five systems and completed predefined functional evaluations. Re-hosted DosimeTron outputs demonstrated near-perfect agreement with the reference pipeline across 651 organ observations (Pearson correlation and Lin concordance correlation coefficient, 0.99999; mean absolute percentage difference, 0.37%). Notably, ReCo configured Orthanc as a PACS-like coordination layer, integrated it with DosimeTron, Merlin, and TotalSegmentator, and orchestrated data retrieval, analysis, and return of valid DICOM RTSTRUCT, RTDOSE, and Structured Report. ReCo provides a unified environment for configuring, documenting, and operationalizing heterogeneous biomedical frameworks, reducing technical barriers to the adoption and integration of emerging computational and AI methods. The official open-source ReCo GitHub repository is available at: https://github.com/eltzanis/ReCo

4

MedZone Embedder: a framework for representation learning of Japanese secondary medical care areas from a national ICU registry, characterizing intensive care provision structure and regional vulnerability

Ohno, K.; Hashimoto, S.

2026-07-20 health informatics 10.64898/2026.07.17.26358373 medRxiv

Top 2%

2.3%

Show abstract

Background: In Japan, acute inpatient care is divided into approximately 335 secondary medical care areas, which serve as the basic units for planning healthcare delivery systems under the 8th National Health Care Plan. While comparisons between regions and facilities typically rely on a single risk-adjusted metric, this approach confuses differences in patient demographics with differences in the actual infrastructure of intensive care units (ICUs). This paper presents a framework - MedZone Embedder - for deriving data-driven indicators of regional structural vulnerability by mapping secondary medical care areas onto a learned similarity space, together with its working implementation. The paper sets out the concept, the method, a proof of concept, and an explicit staged validation program, rather than national empirical results. Methods: Each area is represented by a feature vector consisting of aggregated values of intensive care provision indicators derived directly from the Japan Intensive Care Patient Database (JIPAD) - specifically, risk-adjusted mortality rates (standardized mortality ratios and an in-hospital composite indicator), technical efficiency, length of stay, readmission rates, case severity, and case composition - with the within-area variance of these indicators also taken into account. No hierarchical processing by facility type is performed. A contrastive autoencoder (multilayer perceptron encoder 32 -> 16 -> 8, symmetric decoder) is trained by self-supervised learning, using an objective function that combines reconstruction and normalized temperature cross-entropy (NT-Xent) on noise-augmented views. The resulting 8-dimensional embedding supports area searches based on cosine similarity and anomaly scoring in the embedding space (using isolation forest, Mahalanobis distance, or k-nearest-neighbor density), which is normalized to a vulnerability score ranging from 0 to 1. If deep learning libraries are unavailable, or if the number of areas is small, an alternative method using deterministic principal component analysis is employed. Results: This method was implemented and deployed within an operational ICU decision support system on a managed cloud platform. The proof of concept (PoC) is structured around five secondary medical care areas within Kyoto Prefecture and runs entirely on synthetic facility-level aggregate data constructed to follow the JIPAD indicator schema; no registry data were accessed. It generated: an aggregate provision profile for each area; an area embedding space equipped with a similar-area search function; and a vulnerability ranking that identifies areas with low patient numbers and low diversity that exhibit overall poor outcomes. At this scale, the contrastive autoencoder falls back to principal component projection. The deep learning pathway has been implemented and unit testing has been completed; training and evaluation on actual registry data are pending data-use approval and the expansion of data integration. Validation is staged: Stage 2 will train the contrastive pathway over JIPAD-covered areas to assess construct validity against public structural indicators (ICU/HCU beds, population, accessibility), and Stage 3 will extend coverage to all areas via National Database (NDB) linkage. Conclusion: MedZone Embedder reframes regional comparison from single-indicator ranking to structural representation: which areas are alike, and which are structural outliers. The contribution of this paper is the framework - the proposal that the intensive care provision structure of Japanese secondary medical care areas can be learned from a national outcomes registry and read through the lens of what we call institutional debt - together with a deployed implementation and a pre-specified validation program. To our knowledge, this is a candidate first application of contrastive representation learning to Japanese secondary medical care areas.

5

Dual-Filament 3D Printing of Patient-Specific CT Phantoms with Embedded Implants and Tunable Metal-Artifact Intensity

Pasyar, P.; Mei, K.; Im, J. Y.; Roshkovan, L.; Geagan, M.; Noël, P. B.

2026-07-20 radiology and imaging 10.64898/2026.07.17.26358319 medRxiv

Top 2%

1.8%

Show abstract

ABSTRACT Background: Metallic implants such as orthopedic screws, prostheses, and dental hardware produce beam-hardening, photon-starvation, and streak artifacts that degrade computed tomography (CT) image quality, and the metal artifact reduction (MAR) methods developed to mitigate them require objective, reproducible benchmarking. Purpose: Objective evaluation of MAR algorithms in CT is hindered by the absence of phantoms that simultaneously provide anatomically realistic backgrounds, embedded implants of known geometry, and controllable, ground-truth--referenced artifact intensity. We present a dual-filament, voxel-level three-dimensional (3D) printing method that fulfills these requirements and demonstrate its capabilities on a clinically representative cervical spine case with embedded orthopedic spinal screws. Methods: The proposed method extends the PixelPrint framework, a fused-deposition-modeling (FDM) workflow that converts clinical Digital Imaging and Communications in Medicine (DICOM) data directly into 3D-printer Geometric code (G-code) without intermediate segmentation or surface meshing, to interleaved, voxel-level deposition of two filaments: a calcium-doped polylactic acid (PLA) for soft tissue and bone, and a higher-attenuation metal-doped PLA for metallic implants. For demonstration, anonymized DICOM data of a healthy cervical spine were used to design and fabricate three matched phantoms, each with six embedded spinal screws at C4--C6: a 0% metal-infill ground-truth phantom, a 50% medium-metal-infill phantom, and an 85% high-metal-infill phantom. All phantoms were scanned on a clinical spectral CT system at 120 kVp and 1000 mAs, reconstructed at 0.67 mm slice thickness with virtual monoenergetic imaging (VMI) across 50--190 keV. Method performance was characterized by region of interest (ROI)-based Hounsfield Unit (HU) agreement with the source patient data and by the noise-independent Gumbel-distribution p-index metric. Results: The dual-filament method reproduced patient anatomy, soft-tissue contrast, and screw geometry with high fidelity. ROI HU values agreed with patient data within {+/-}25 HU for soft tissue and trabecular bone; cortical regions were underestimated owing to the current ceiling of the calcium-doped PLA used in this study. The tunable-artifact behavior was quantified as follows: the Gumbel location parameter scaled monotonically from 46.7 HU (no-metal background) to 57.1 HU (50% infill) to 90.5 HU (85% infill) for the VMI 70 keV with standard filter. High-keV VMI reconstructions substantially reduced streak and beam-hardening artifacts while preserving anatomic detail. Conclusions: The proposed dual-filament, voxel-level PixelPrint method enables the fabrication of patient-specific, multi-material CT phantoms with embedded metallic implants and controllable, ground-truth--referenced artifact intensity. Although demonstrated here in a single cervical-spine case, the workflow is anatomy- and implant-agnostic by construction and could in principle be adapted to other musculoskeletal sites (e.g., knee, hip, dental) and implant materials, providing a reproducible methodological foundation for benchmarking MAR algorithms, characterizing spectral CT performance, and validating emerging photon-counting detector systems. Keywords: 3D printing methodology; fused deposition modeling; voxel-level multi-material printing; spectral computed tomography; metal artifact reduction; phantom design; orthopedic implants; dual filament; PixelPrint.

6

A ReAct Agentic AI System for Natural Language Querying and Statistical Analysis of The Cancer Genome Atlas Clinical Data

Korutla, R.; Amal, S.

2026-07-17 health informatics 10.64898/2026.07.15.26358188 medRxiv

Top 3%

1.7%

Show abstract

The Cancer Genome Atlas (TCGA) holds clinical data for over 11,000 patients across 33 cancer types, but access is hard because of complex file structures, heterogeneous formats, and the need for programming. We present an agentic system for natural language querying and statistical analysis of TCGA clinical data. The system uses a large language model as an autonomous ReAct agent that selects from eight computational tools, including data extraction, descriptive statistics, Kaplan-Meier survival analysis with log-rank tests, hypothesis testing, and verification against the curated TCGA Pan-Cancer Clinical Data Resource (CDR). The agent reasons about intermediate results, adapts its approach, and returns clinically contextualized responses with source attribution and auditable traces. We introduce TCGA-Agent-Bench, 440 queries across five difficulty tiers with ground truth from the independently curated TCGA-CDR, evaluated with dual metrics of numerical accuracy and clinical completeness. The system achieves 93.4% overall accuracy (100% single-patient lookups, 99.1% cohort statistics, 92.8% comparative analyses), outperforming a fixed rule-based pipeline (87.1%), a single-pass LLM (81.8%), and retrieval-augmented generation (66.9% on a subset). Most of the benchmark is answerable from the CDR alone, so we locate the extraction layer's value in fields the CDR lacks (drug treatments, TNM components, biomarkers, biospecimen metadata): on 26 queries targeting these, the full system answers 100% versus 3.8% for CDR-only. Ablations show the reasoning loop is most impactful (+9.1% accuracy, +22.0 completeness points). A tool-based agentic architecture enables accurate, auditable analysis of clinical repositories, with value driven by tool design and recovered fields rather than model scale.

7

CuGen: A GPU-accelerated framework for large-scale genomics

Kiiskinen, T.; Richland, J.; Wang, W.; Lu, W. S.; Balasubramanian, N.; Hastie, T.; Tibshirani, R.; Rivas, M. A.

2026-07-17 genetic and genomic medicine 10.64898/2026.07.15.26358178 medRxiv

Top 3%

1.5%

Show abstract

Biobank-scale genomic analyses remain computationally expensive, CPU-bound workflows, particularly when adjusting for confounding. Here, we present CuGen, a GPU-accelerated framework for large-scale genomics. CuGen uses UltraLasso, a novel hierarchical application of univariate-guided sparse regression (uniLasso), to select a compact, phenotype-informed active set of fewer than 30,000 variants. This achieves robust leave-one-chromosome-out (LOCO) confounding control, enabling both downstream GWAS and in-sample fine-mapping. Additionally, we introduce the .cugen file format, a genotype representation designed for memory-optimized, high-throughput streaming and random access on GPU hardware. Building on this substrate, we provide a general GPU-accelerated genomics toolkit handling polygenic prediction, data manipulation, quality control, analysis, and visualization. We demonstrate CuGen's efficacy in the UK Biobank with up to 408,624 individuals, where the full GWAS pipeline and fine-mapping against 6.8 million imputed variants completes in approximately 10 minutes on a single high-throughput GPU with 80 GB of memory. The pipeline scales efficiently to massive phenome-wide analyses with sublinear resource consumption.

8

Pathways, Perceptions, and the Luck of the Draw: A Qualitative Study of Adolescent Idiopathic Scoliosis Imaging and Referral Services in England.

Robinson-Smith, L.; Jafari, M.; Kottam, L.; Clark, N.; Rangan, A.; Adamson, J.

2026-07-19 radiology and imaging 10.64898/2026.07.16.26358249 medRxiv

Top 3%

1.4%

Show abstract

Introduction Adolescent idiopathic scoliosis (AIS) requires frequent x-rays for management, exposing young patients to cumulative radiation risks. While radiation-sparing imaging modalities exist, access across the National Health Service (NHS) remains uneven and information given to patients is variable. This qualitative study investigated the systemic, geographic, and interpersonal dynamics of AIS imaging in England. Design This qualitative study employed in-depth semi-structured interviews with healthcare professionals (HCPs) from NHS paediatric spinal centres, patients aged 13 to 25 years old with AIS and parents/carers of young people with AIS. Setting England. Participants A total of 22 HCPs from 13/24 NHS paediatric spinal centres in England, 19 10-25 years with AIS and 11 parents/carers. Results Conventional x-ray remains the main imaging modality. Significant geographic inequality exists. The most commonly available radiation-sparing imaging modality available is the EOS system, which uses slot-scanning technology, is available at 7 centres in England, primarily in London imaging networks. Acquisition of EOS systems is currently driven by local charitable funding rather than a centralised strategy, with high capital and installation costs cited as primary barriers. Inconsistent knowledge of imaging within primary care and a lack of specialist expertise in local secondary care services led to diagnostic redundancy, gatekeeping, and low value inconsistent imaging. These systemic delays frequently closed the window for conservative treatments like bracing. A professional balancing act exists between the duty to inform and the desire to minimise patient anxiety. HCPs often use selective communication regarding radiation risks. Conversely, families demonstrate high relational trust with HCPs and low baseline knowledge of cumulative exposure, often viewing frequent imaging as a reassuring marker of clinical progress. In centres with EOS systems, clinicians felt empowered to lead proactive, transparent risk discussions. In standard X-ray settings, dialogue remains reactive and infrequent, leading to a reliance on implied rather than truly informed consent. Conclusions AIS imaging in England is variable. Geographic location dictates access to low-dose radiation technology and the quality of informed consent. Systemic inefficiencies and fragmented referral pathways contribute to diagnostic redundancy and delayed specialist care. National standardisation of clinical pathways, information provision and a centralised strategy for low-dose technology procurement are essential to eliminate structural inequalities and ensure equitable, transparent care for all patients.

9

Cognition in younger women with premature ovarian insufficiency

Naysmith, L.; Rida, L.; Hampshire, A.

2026-07-16 sexual and reproductive health 10.64898/2026.07.14.26358044 medRxiv

Top 3%

1.2%

Show abstract

Premature ovarian insufficiency (POI) significantly impacts quality of life, yet the immediate cognitive landscape and lived experience of younger women remain under-researched. In 125 young women (aged 19-48; 66 with idiopathic POI, 59 age-matched controls), we examined self-reported cognitive distress and symptom burden within the POI cohort and compared objective global and domain-specific cognitive performance between groups. Objective accuracy scores were derived from six online tasks (Cognitron) and combined into a robust global measure. Within the POI cohort, there were significant differences in symptom burden domains ({chi}(3) = 61.90, p<0.001), with psychological and sexual symptoms reported at a significantly higher intensity than physical and vasomotor symptoms (all p<0.001). Furthermore, the standardised magnitude of perceived cognitive distress (56.20%) was significantly greater than that of overall symptom burden (42.00%, p<0.001). Case-control comparisons revealed no significant differences in global cognitive performance (p=0.615), yet the POI cohort performed significantly less accurate than controls in verbal analogical reasoning (-0.86 SD, 95% CI: -1.52, -0.20, p = 0.011). The findings highlight an urgent need for comprehensive emotional and psychosexual support in POI care. Additionally, the presence of high cognitive distress alongside localised objective deficits demonstrates that cognitive health monitoring must be proactive in early adulthood, especially given their established long-term risks for later-life cognitive decline and dementia.

10

MeshScope-Region: Distribution, Road-Network Accessibility, and Nine-Year Evolution of ICU and HCU Capacity Across Japan's 330 Secondary Medical Areas

Ohno, K.; Hirai, M.; hashimoto, s.

2026-07-20 health informatics 10.64898/2026.07.17.26358374 medRxiv

Top 3%

1.2%

Show abstract

Background: In Japan, health planning is organized around secondary medical areas (SMAs; niji-iryo-ken; 330 areas in the 2025 classification), yet nationwide analyses of intensive care unit (ICU) capacity have been conducted mainly at the prefecture level, and a recent SMA-level study addressed only the presence or absence of ICUs. The full supply structure of intensive and intermediate critical care - ICU and high care unit (HCU) beds - has not been characterized at the SMA level with respect to its composition, road-network accessibility, and evolution over time. Methods: We developed MeshScope-Region, an analytical platform built on the Hospital Bed Function Reports (byosho-kino-hokoku) for fiscal years 2016-2024, in which ICU and HCU beds were identified from notified reimbursement categories and aggregated to SMAs. Three analytical layers were integrated: (1) cross-sectional distribution of ICU/HCU beds; (2) nationwide road-network accessibility computed with the Open Source Routing Machine (OSRM) from 176,962 populated 1-km census grid cells to all facilities reporting ICU or HCU beds; and (3) a nine-year longitudinal analysis of supply-structure types, classified by k-means (k = 6) in an 8-dimensional PCA space anchored to fiscal year 2024, with earlier years projected into the same space. Results: In fiscal year 2024, 20,631 ICU/HCU beds were reported nationally (7,114 ICU-type; 13,517 HCU-type) at 1,044 facilities. Zone-level totals among SMAs with any beds ranged 229-fold (3-688 beds); the 90th/10th percentile ratio of per-capita density was 3.6. In total, 90.1% of the population resided within 30 minutes' drive of a facility with ICU beds and 97.8% within 60 minutes; only 0.8% resided beyond 90 minutes. Although 140 of the 330 SMAs had no ICU facility within their own boundaries, 84.7% of their residents could reach an ICU facility in an adjacent area within 60 minutes' drive. Longitudinally, supply structures were highly persistent: 63.0% of SMAs (208/330) retained the same structural type across all nine years, adjacent-year rank correlations of a supply-vulnerability index were 0.887-0.924 (2016 vs. 2024: rho = 0.711), and the number of SMAs with zero ICU beds remained frozen at 133-141. The Gini coefficient of bed distribution declined from 0.384 to 0.262 - although computed on ICU-type beds alone it remained 0.365 in fiscal year 2024 - and capacity growth (total +27.9%) was driven predominantly by HCU beds (+41.6%) while ICU beds grew only +8.0%. Conclusions: Japan's critical care supply structure is regionally rigid, with a stable set of approximately 140 SMAs lacking ICU beds for nearly a decade, yet road-network accessibility substantially mitigates the consequences of zone-level absence. Recent capacity growth - and much of the apparent equalization - has occurred predominantly in intermediate care. MeshScope-Region provides a standing, reproducible evidence base at the geographic unit of Japan's medical planning cycles.

11

Validating Artificial Intelligence Guidance for Ultrasound Acquisition and Remote Interpretation

Maldonado, T.; Muluk, S.; Rali, P.; Soni, N.; Nathanson, R.; Kuttab, H.; VandeHei, M.; Michels, C.; Swietlik, J.; Speranza, G.; Schaffer, O.; Collaborating Investigators Group, ; Al Noor, F.; Mischkewitz, S.; Kainz, B.; Blaivas, M.; Jacobowitz, G.

2026-07-19 radiology and imaging 10.64898/2026.07.16.26356882 medRxiv

Top 3%

1.2%

Show abstract

Background: Venous thromboembolism (VTE), including deep vein thrombosis (DVT), remains a major global health burden. Diagnostic pathways rely on ultrasound but are limited by availability and prolonged time-to-imaging. Novel artificial intelligence (AI) guidance systems have been designed to enable non-ultrasound-trained operators to acquire proximal lower extremity compression ultrasounds for remote clinician interpretation. Methods: This multicenter, double-blinded, prospective, nonrandomized study evaluated the performance of an AI guidance system (ThinkSono Guidance, ThinkSono, GmbH). Patients underwent AI-guided ultrasound(s) and standard of care ultrasound(s). Primary and secondary endpoints were image quality, sensitivity and specificity for proximal DVT, and prioritization specificity, a measure of specificity in identifying patients requiring standard of care ultrasound after AI-guided scan. Results: Of 634 recruited subjects, 594 were analyzed, with 67 DVTs across 700 scans. 86.83% of AI-guided scans achieved diagnostic image quality. Triage sensitivity was 92.86%, triage specificity 39.12%, prioritization specificity 97.96%. Standard of care ultrasounds could be avoided in 35.32% of patients. Total median AI-guided scan and review time was 7.57 minutes. Conclusions: Clinician-reviewed AI-guided scans were rapid, sensitive for DVT, and specific for prioritizing patients requiring standard of care ultrasounds. These findings suggest AI-guided ultrasound may be a scalable triage strategy to expand DVT evaluation access, particularly in resource-constrained and after-hours settings

12

Parameter-efficient deep learning for pneumonia detection on chest X-rays: A comparative evaluation of explainable AI methods

Mahtabi, B.; Nasr-Esfahani, E.; Yaraghi, S.

2026-07-16 radiology and imaging 10.64898/2026.07.14.26358065 medRxiv

Top 4%

1.1%

Show abstract

Pneumonia is a leading cause of infectious disease mortality worldwide, accounting for approximately 2.5 million deaths annually and 15% of deaths in children under five. Chest X-ray imaging remains the primary diagnostic tool, but accurate interpretation requires radiological expertise that is disproportionately concentrated in high-income settings, creating a diagnostic gap where disease burden is highest. Automated deep learning offers a scalable complement to specialist-dependent diagnosis, yet clinical adoption requires both high accuracy and transparent, interpretable reasoning. Convolutional neural networks (CNNs) have shown strong potential for pneumonia detection from chest X-rays, but two barriers impede clinical translation: the interpretability of black-box models and the computational feasibility of large architectures in resource-constrained settings. Explainable AI (XAI) methods such as Grad-CAM, Grad-CAM++, and Score-CAM address the interpretability barrier, yet systematic quantitative comparisons across multiple CNN architectures remain scarce. Furthermore, CNN architectures widely used for medical image classification carry high parameter counts that limit feasibility in resource-constrained settings, motivating architectures that achieve competitive accuracy with substantially fewer parameters. Here we propose a parameter-efficient deep learning framework for pneumonia detection based on transfer learning, evaluated across three CNN architectures representing distinct architectural families: EfficientNet-B0 with fine-tuning (proposed method), ResNet50, and DenseNet121, trained under identical conditions on the Kaggle chest X-ray dataset (5,863 images). Our method achieved 90% classification accuracy, outperforming both baselines while requiring 4.8x fewer parameters than ResNet50. To evaluate explainability, Grad-CAM, Grad-CAM++, and Score-CAM were applied across all three architectures and compared quantitatively using Intersection over Union against manually annotated lung segmentation masks, Insertion score, and Deletion score, with pairwise statistical validation via Wilcoxon signed-rank tests and Bonferroni correction. Findings show that classification accuracy and XAI explanation quality must be evaluated independently, and that the proposed parameter-efficient architecture offers a favorable trade-off for resource-constrained clinical deployment.

13

LocusBlend: Flexible multi-index regional visualization of genomic association signals

yang, c.; Cook, N.; Zeng, Y.; Fu, T.; budde, J.; Cruchaga, C.; Belloy, M. E.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.15.26358129 medRxiv

Top 4%

0.9%

Show abstract

Summary It has become standard practice to visualize regional signals from genomewide association studies GWAS using LocusZoom plots Similarly GWAS signals are compared to regionally matched quantitative trait loci QTLs ie varianttogene regulation data using LocusCompare plots to aid assessment of candidate traitrelated genes Despite broad usage these tools annotate variants by linkage disequilibrium LD to a single lead or index variant This singleindex representation has limitations for visualizing complex loci that contain multiple independent signals We present LocusBlend an interactive web application for multiindex LDblended visualization of genomic loci LocusBlend supports one or two genomic association summarystatistic datasets and one to three index variants multiindex LocusZoom colorblended plots and matching LocusCompare visualizations Applications to Alzheimers disease GWAS and QTL signals illustrate LocusBlend enables visualization and separation of independent signals despite shared LD and high genomic complexity Overall LocusBlend is aimed at supporting researchers handle the continuously expanding complexity of human genomics findings Availability and Implementation LocusBlend is freely available at httpslocusblendwustledu Publication ready plots are generated in 1min Source code documentation example datasets input templates and reproducibility instructions are available at httpsgithubcomBelloyLabLocusBlend LocusBlend is implemented in Python using Streamlit Plotly and PLINK Supplementary Information Supplementary data are available online

14

Aggregating data to accelerate personalized therapy in heart failure (ADAPT-HF)

Roeder, C.; Goerg, C.; Talebi, A.; Stevens, L. M.; Scholtens, D. M.; Rasmussen-Torvik, L. P.; Alagna, L. M.; Shah, S. J.; Hall, J. L.; Das, A. K.; Jhund, P. S.; Kao, D. P.

2026-07-16 health informatics 10.64898/2026.07.13.26357501 medRxiv

Top 5%

0.6%

Show abstract

Background: Increased public access to data from disparate sources provides opportunities to study and validate predictive and subphenotype models in heterogeneous disease conditions using aggregated individual patient data. Robust, explicit, and transparent harmonization of data elements is critical to ensure interpretability, reproducibility, and generalizability of secondary and retrospective analyses. Methods & Results: We designed and implemented ADAPT (Aggregating Data to Accelerate Personalized Therapy), a scalable framework using multiple software packages (R, SQL, BigQuery) that enables rapid, explicit harmonization of structured data elements from randomized trials and observational studies using a standard spreadsheet interface. User-specified criteria are applied to primary study data to produce harmonized longitudinal datasets comprised of demographics, medical history, quantitative observations, repeated measures, and clinical outcomes. We demonstrate this functionality using 26 clinical studies found in the National Heart, Lung, and Blood Institute BioLINCC resource. We illustrate the scalability of ADAPT to the order of billions of datapoints using administrative clinical data in a cloud-computing platform. We also present examples of collaborators using ADAPT for independent harmonization tasks for secondary analyses and democratization of publicly available data. Conclusion: ADAPT is a disease-agnostic, extensible, and scalable platform to support robust, transparent harmonization of structured research data using interfaces accessible to a variety of researchers regardless of programming ability. It extends FAIR principles beyond research data to also represent harmonization analyses by improving Findability of harmonization decisions, Accessibility of methods to other stakeholders, Interoperability with independent analyses and datasets, and Reusability through efficient implementation in a variety of analysis environments.

15

Toward precision rehabilitation in adolescent mild traumatic brain injury: leveraging physiologic data from commercially available smartwatches to identify patient subgroups

Kettlety, S. A.; Akrong, E. R.; Suskauer, S. J.; Roemmich, R. T.; Slomine, B. S.; Svingos, A. M.

2026-07-17 pediatrics 10.64898/2026.07.16.26358245 medRxiv

Top 6%

0.5%

Show abstract

Autonomic dysfunction is a common sequela of mild traumatic brain injury (mTBI). Physical activity progression is an integral component of mTBI rehabilitation, particularly in addressing autonomic dysfunction. However, clinicians often rely on point-in-time evaluation of orthostatic and exercise intolerance to guide activity recommendations. Commercially available wearable devices (e.g., Fitbits) provide an opportunity to evaluate heart rate response to activity in a real-world setting. Previous work has used physiologic (heart rate) and activity (step count) data to identify subgroups of adults with stroke that may be used to guide activity recommendations. This method may be useful to subgroup youth post-mTBI to identify those who have abnormal physiologic responses to activity. We aimed to identify subgroups using heart rate and step count data in adolescents presenting for specialty care after diagnosed mTBI. Eighty participants aged 13-18 within six months of mTBI diagnosis were recruited to wear a Fitbit Sense 2. Data from seven days and two nights collected within fourteen days of enrollment were included. A group-based steps per minute (SPM) threshold (25th percentile; 10 SPM) and individualized heart rate threshold (20% heart rate reserve (HRR)) were used to classify each minute of active daytime data into one of four quadrants: SPM>10 & HRR>20% (QI), SPM<10 & HRR>20% (QII), SPM<10 & HRR<20% (QIII), and SPM>10 & HRR<20% (QIV). We used percentage of minutes in each quadrant, mean steps per day, percentage of minutes with zero steps, mean SPM in QI, and resting heart rate in a k-means clustering algorithm to identify subgroups. We evaluated subgroup differences by clustering variables using Kruskal-Wallis tests. Sixty-one participants were included. Three subgroups emerged: Sedentary (n=12), Active (n=23), and Atypically Elevated Heart Rate (AEHR; n=26). Subgroups varied significantly on all clustering variables (p<0.01). The Active subgroup took a high number of steps per day, had lower sedentary time, and had the highest activity intensity (mean SPM in QI). The Sedentary subgroup took fewer steps per day compared to the Active subgroup, had high sedentary time, and showed the highest resting heart rate. The AEHR subgroup took fewer steps per day compared to the Active subgroup and had high sedentary time. The AEHR subgroup also spent a higher percentage of time with an atypically high heart rate response to low levels of activity compared to the other subgroups. Our findings suggest that data from wearable devices can identify subgroups of adolescents with mTBI with distinct physiologic/physical activity profiles, which may ultimately be used to inform personalized activity prescriptions. Future work should aim to understand how the identified subgroups relate to longitudinal outcomes.

16

The Variance-Stabilizing Transformation for the Poisson Rate Ratio: Closed-Form Confidence Intervals

Ng, S.-P.

2026-07-18 epidemiology 10.64898/2026.07.16.26358255 medRxiv

Top 6%

0.5%

Show abstract

The incidence rate ratio R is the standard measure for comparing event rates in clinical trials and epidemiology. In vaccine trials, the vaccine efficacy is VE = 1 - R. When events are rare, the two arm counts are Poisson. The estimator of R is heteroskedastic: its sampling variance changes with the data. So no fixed-width interval covers correctly everywhere. The usual log-Wald interval is undefined at zero events and covers poorly at small counts. Early vaccine and drug-safety readouts fall in exactly this regime. We show that a single reparameterization collapses this bivariate problem to an effective one-parameter family with a quadratic variance function, whose variance-stabilizing transformation is 2 arcsinh(sqrt(R)). The reduction yields a closed-form confidence interval for R. Its two leading errors, a curvature bias and the variability of the estimated scale, each admit a closed-form correction with no tuning constants. In a Monte Carlo study of our seven arcsinh variants and five competitors, the +Curve+Stu variant covers within 0.002 of the nominal 0.95 for about 50 control and 5 treatment events. Its width is on par with the best competitor. It avoids the conservatism and zero-count breakdown of log-Wald and MOVER. For moderate counts, we recommend this interval; for sparser data, our Bar-Lev and Enis count-shift variant is more robust. The result is a ready-to-use, closed-form interval for the low-count regime. We illustrate it on early Covid-19 vaccine-efficacy readouts and provide reference implementations in R and Python.

17

FootNet: A Multi-View Smartphone Dataset and Four-Model Benchmark for Clinical Foot Segmentation

Vijay, A.; Prabhune, A.; Srihari, V. R.; Rayampalli, A.

2026-07-17 health informatics 10.64898/2026.07.15.26358117 medRxiv

Top 7%

0.4%

Show abstract

We present FootNet, a 453-image multi-view smartphone foot dataset for binary foot segmentation, with expertannotated masks across six anatomical views (dorsal, medial, and plantar, both left and right). We benchmark four segmentation models under a controlled protocol: U-Net with a MobileNetV2 encoder achieves the best performance (IoU 0.9268, Dice 0.9608, 95 % CI [0.9209, 0.9320]); DeepLabV3 with MobileNetV3-Large scores IoU 0.8984 (Dice 0.9449); UNet++ with MobileNetV2 scores IoU 0.8913 (Dice 0.9391); and SAM ViT-B with oracle boundingbox prompt scores IoU 0.9219 on the matched 191-image subset. Bonferroni-corrected Wilcoxon signed-rank tests (k = 6 comparisons) show U-Net significantly outperforms DeepLab (p < 0.001, r = 0.638) and SAM ViT-B with oracle boundingbox (p = 0.005, r = 0.202); UNet++ does not significantly differ from DeepLab (p = 0.062). Connected-component postprocessing yields negligible benefit (mean {triangleup}IoU = +0.0003, 12 of 453 images improved). The extended dataset is available upon request

18

Statistical Inference and Power Analysis for Comparative F1 and Fβ Scores under Correlated Classifier Pairs

Hsu, C.-Y.; Liu, Q.; Shyr, Y.

2026-07-17 dermatology 10.64898/2026.07.15.26358166 medRxiv

Top 7%

0.4%

Show abstract

As machine learning and artificial intelligence systems are increasingly used in healthcare, rigorous evaluation of their classification performance has become critical. The F1 and F{beta} scores are widely adopted metrics for assessing performance in imbalanced biomedical data. Recently, we introduced psF1, a unified statistical framework for inference and study design for single and comparative F1 and F{beta} scores under the assumption of independent classifiers. In practice, however, benchmarking two classifiers on the same dataset creates a correlated paired setting. Ignoring this intrinsic dependency leads to overestimation of the standard error and a substantial loss of statistical power. To address this, we develop psF1pair, an advanced framework for statistical inference and power analysis that explicitly accounts for correlations between classifier pairs. Extensive simulation studies demonstrate the performance of psF1pair, and its utility is further illustrated through application to a real-world imaging classification system. As expected, higher correlation between classifiers yields narrower confidence intervals and enhanced statistical power. A freely available R package is provided to facilitate implementation, supporting accurate evaluation and study design for predictive and classification models in biomedical research.

19

Simulation of synthetic health records for assessment of causal inference methods for vaccine efficacy

Velasco Pardo, V.; Daines, L.; Katikireddi, S. V.; Ritchie, L.; Robertson, C.; Simpson, C. R.; McCowan, C.; Swallow, B.

2026-07-19 infectious diseases 10.64898/2026.07.17.26358308 medRxiv

Top 8%

0.4%

Show abstract

Background During the COVID-19 pandemic, public health agencies used near real-time observational data to answer questions regarding vaccine effectiveness. However, traditional observational methods do not allow conclusions regarding counterfactual scenarios to be drawn from clinical data. Counterfactuals, which are outcomes that would have occurred under alternative interventions, can be used to formally assess the causal effects of public health interventions on health outcomes while accounting for the effects of confounding. Ideally individual patient data is used for the development of counterfactuals. Low-fidelity synthetic data may be useful for advancing methodological development where governance and privacy constraints prohibit access to sensitive personal data. Methods We simulated synthetic datasets based on the EAVE-II COVID-19 platform which has been limited to use for surveillance purposes. EAVE-II includes almost all resident people in Scotland registered with qualified general medical practitioners. Patient characteristics were simulated to reflect the known distribution of the Scottish population, accounting for dependencies between variables. Each synthetic dataset was encoded to different realistic scenarios for EAVEII 'ground truth' vaccine rollout and effectiveness results, explicitly stating the causal and confounding mechanisms, using a statistically sound method based on marginal structural models. Synthetic datasets of 100,000 individuals were then generated across five confounding scenarios and five severe outcome types. Results In scenarios with weak confounding, both unweighted and inverse probability of treatment weighted (IPTW) logistic regression recovered the true causal parameters. As confounding strength increased, only weighted models recovered the true mechanism. Conclusions Low-fidelity synthetic datasets simulated from EAVE-II data analysts to build and test causal inference pipelines, develop novel analysis pipelines, and train new researchers while awaiting access to real data. We showed how to generate synthetic datasets from a marginal structural model under different confounding scenarios.

20

Initial Technical and Clinical Validation of Mobile Pupillometry with Virtual Reality: A Digital Biomarker for Screening Cognitive Function and Impairment

Brendler, A.; Fietz, J.; Bauer, A.; Pfahl, D.; Higgins, S.; Vidovic, E.; Brueckl, T.; BeCOME Working Group, ; Memory Clinic Working Group, ; Hupe, K.; Knop, M.; Spoormaker, V. I.

2026-07-17 neurology 10.64898/2026.07.15.26358187 medRxiv

Top 8%

0.4%

Show abstract

Cognitive impairment is a prevalent symptom extending from physiological ageing to disease. It commonly manifests itself in initial memory problems, progressing and co-occurring in more severe conditions such as Mild Cognitive Impairment, Alzheimer's Disease and Major Depressive Disorder. However, current non-invasive screening assessments either lack biological information or are invasive and restricted to specialized centers with complex and cost-intensive set-ups. Here, we conducted an initial validation of mobile pupillometry with Virtual Reality (VR) under experimental conditions as a digital biomarker for cognitive impairment by testing required biomarker-specific properties. For this purpose, we first assessed its construct validity by testing healthy participants (n=43) on an n-back task in VR while pupil size was measured. Mixed effects models revealed that similar to lab-based eye-tracking systems, pupil size increased in a sensible and distinguishable fashion as a function of working memory load. Second, to test the signal's reliability, the same participants were tested on the identical set-up two to three months after their first visit. We observed that the pupil response profile was highly stable over this period. Third, for its clinical validity, we examined patients (n=89) from three different cohorts with varying degrees of cognitive impairment and compared them to healthy control participants (n=81). Mixed-effects models indicated that pupil size was reduced as a function of cognitive impairment levels at higher cognitive load and that this effect was stronger pronounced with increasing age. In conclusion, we provide initial evidence for mobile pupillometry being a sensitive, reliable and clinically valid digital biomarker for cognitive functioning and impairment, which offers desirable properties due to its quick, automatized and location-independent set-up. Keywords: digital biomarker, mobile pupillometry, Virtual Reality, cognition, , Major Depressive Disorder, Mild Cognitive Impairment, Alzheimer's Disease