Scientific Data — Latest Matching Preprints

1

An fMRI dataset of verbalized spontaneous thought with annotated transcripts and self-report trait measures

Zhang, M.; Liu, P. R.; Su, H.; Zhao, M.; Li, X.; Born, S.; Lee, Y.; Honey, C.; Chen, J.; Lee, H.

2026-05-12 neuroscience 10.64898/2026.05.12.724488 medRxiv

Top 0.1%

22.5%

Show abstract

Spontaneous thought is pervasive in everyday human cognition, yet datasets capturing its neural dynamics under minimally interrupted conditions remain limited. The current dataset was acquired from a think-aloud functional MRI experiment in which 118 participants continuously verbalized their spontaneous thoughts during 10-minute scanning sessions. The raw MRI data and verbal transcripts with sentence-level timestamps were previously released and analyzed in our prior study examining neural activity associated with thought transitions. Building on that release, we additionally provide preprocessed MRI data, speech transcriptions with word-level timestamps aligned to image acquisition, large language model-generated ratings of transcribed thoughts across emotional and sensory dimensions, and self-report survey measures assessing personality, mental health, and cognitive abilities. Validation analyses demonstrated activation in expected cortical regions associated with speech production and sensory content identified from transcript annotations, agreement between language model and human ratings, and adequate internal consistency of survey measures, supporting the datasets overall quality. This dataset enables reuse for investigations of spontaneous thought, speech generation, and individual differences using naturalistic functional MRI data.

2

Automatic segmentation of choroid plexus using deep learning across neurodegenerative diagnoses in the multi-site COMPASS-ND Study

Singh, M.; Dabo, F.; Trigiani, L. J.; Araujo, D.; Narayanan, S.; Badhwar, A.

2026-05-18 radiology and imaging 10.64898/2026.05.14.26353194 medRxiv

Top 0.2%

8.6%

Show abstract

The choroid plexus (ChP) plays a central role in cerebrospinal fluid production, immune signaling, and metabolic clearance, and has emerged as a potential imaging biomarker of neurodegeneration. However, accurate and scalable quantification of ChP volume remains challenging due to its complex morphology and low contrast on conventional MRI. The Automatic Segmentation of Choroid Plexus (ASCHOPLEX), a deep learning framework originally trained on healthy controls and multiple sclerosis cohorts, has not been systematically evaluated in neurodegenerative populations. Using T1-weighted MRI from the multi-center COMPASS-ND study, we assessed standard ASCHOPLEX performance in cognitively unimpaired (CU), Alzheimer's disease (AD), and Parkinson's disease (PD) participants (N = 30), followed by fine-tuning using expert manual segmentations (N = 60). Segmentation accuracy was evaluated using Dice, Jaccard, precision, and recall. The fine-tuned model was then applied to a larger cohort (N = 277) to derive normalized ChP volumes, which were compared across diagnostic groups using linear regression models. Fine-tuning significantly improved segmentation accuracy across all metrics (Dice: 0.45 to 0.84; Jaccard: 0.32 to 0.73; all p < 0.0001), enabling robust ChP delineation across sites and conditions. In the full cohort, normalized ChP volume was significantly higher in AD compared with CU and PD (p < 0.0001), while PD did not differ from CU (p = 0.31). These findings demonstrate that dataset-specific adaptation is essential for deploying deep learning segmentation models in heterogeneous neuroimaging cohorts. The refined ASCHOPLEX framework enables scalable ChP quantification and supports its use as a structural imaging marker in neurodegenerative disease.

3

Pixel-Based Skin Tone Estimation on Dermoscopy: A Dual-Rater MST Benchmark and Feasibility Study

Kumarasinghe, A.; Bui, V.; Ghanbarzadeh, R.

2026-05-17 health informatics 10.64898/2026.05.13.26353004 medRxiv

Top 0.3%

5.0%

Show abstract

Skin-tone labels are absent from public dermoscopy benchmarks such as the International Skin Imaging Collaboration (ISIC), making it impossible to audit whether clinical AI performs equitably across skin tones. While several recent works estimate skin tone automatically from clinical photography and selfies, we ask whether this approach is feasible on dermoscopy, the primary imaging modality of these benchmarks. To answer this, we make three main contributions. First, we release MST-Derm, a dual-rater Monk Skin Tone (MST) annotation benchmark on 500 ISIC 2018 images. Raters were given an explicit unrateable option for crops where the skin surrounding the lesion was too occluded to label confidently. We find that 60% of images were marked unrateable, yielding a 193-image consensus subset (quadratic-weighted Cohen's Kappa = 0.82). Second, we conduct a systematic feasibility study of three pixel-based MST annotation pipelines spanning the principal families in prior work: palette matching in perceptual colour space, robust colour statistics, and projection to a 1D colorimetric scalar. All three pipelines produce ordinal signal above chance (95% confidence intervals on quadratic-weighted Kappa exclude zero). However, ISIC 2018's extreme light-skin bias leaves 82% of the evaluation set at MST 2, giving a constant "always predict MST 2" baseline an accuracy floor the methods cannot overcome. To separate algorithmic signal from dataset bias, we evaluate on a class-balanced subset. The best method reaches quadratic-weighted Kappa = 0.43 against the trivial baseline of Kappa = 0.00, confirming the signal is genuine. Third, we diagnose this performance ceiling. We trace the bottleneck to two causes: dermoscopy's specialised illumination physically compresses the colour range on which lighter skin tones differ, and ISIC's dataset skew makes standard absolute-accuracy metrics uninformative. We conclude that while pixel-based colour features carry real MST signal on dermoscopy, current performance is insufficient for autonomous annotation. We release the benchmark, annotation protocol, all prediction runs, and analysis code to facilitate the development of robust skin-tone estimators, a vital prerequisite for accurately auditing fairness and mitigating bias in dermatological machine learning.

4

Scan length as a major driver of CT radiation dose: a diagnostic reference level audit from Kosovo

Rudi, G.; Vula, F.; Bicaku, A.; Dedushi, K.; Ahmetgjekaj, I.

2026-05-17 radiology and imaging 10.64898/2026.05.12.26353024 medRxiv

Top 0.7%

2.6%

Show abstract

Computed tomography is the largest contributor to population radiation dose from medical imaging, yet no diagnostic reference levels (DRLs) have been published from Kosovo or the Western Balkans. This retrospective audit analyzed all CT examinations performed on a 128- slice scanner at the University Clinical Centre of Kosovo between January and March 2026. After exclusions, 1,535 acquisitions from 1,092 patients across nine examination categories were analyzed. Local DRLs were defined as the 75th percentile and compared against German (BfS 2022) and Turkish (Kahraman et al., 2024) reference values. Head CT (n = 590) demonstrated CTDIvol 4.7% below the BfS DRL yet scan length 98.5% above the orientation value (median 25.8 vs 13 cm). Abdomen-pelvis CTDIvol matched the BfS reference while scan length exceeded it by 28%. Coronary CTA showed CTDIvol +377%, consistent with retrospective ECG gating. Excess scan length, not CTDIvol, is the major driver of elevated dose at this institution. The identified excesses are correctable through technologist landmarking training, protocol review, and enabling iterative reconstruction.

5

Computational framework for the World Health Organization estimates of the global, regional and national burden of foodborne diseases 2026 edition

Devleesschauwer, B.; Vaes, L.; Fernandez, K.; Borghi, E.; Cao, B.; Fastl, C.; Jakobsen, L. S.; Kumapley, R.; Lake, R. J.; Majowicz, S. E.; Minato, Y.; Pires, S. M.; Mughini-Gras, L.; Nane, G. F.; Robertson, L.; Scallan Walter, E.; Torgerson, P. R.; Kretzschmar, M. E.; di Bari, C.

2026-05-17 public and global health 10.64898/2026.05.13.26353030 medRxiv

Top 0.7%

2.4%

Show abstract

Background Foodborne diseases cause substantial global morbidity and mortality, yet remain largely unattended. To support countries to address this public health concern, the World Health Assembly Resolution 73.5 called for strengthening global food safety efforts and led to the development of the WHO Global Strategy for Food Safety 2022-2030, adopted at the 75th WHA (2022). To this end, the World Health Organization (WHO) reconvened the Foodborne Disease Burden Epidemiology Reference Group (FERG) to advise and support the work to generate updated global, regional, and national estimates of the foodborne disease burden for the reference period 2000-2021. Methods We developed an incidence-based framework expanding coverage to 42 foodborne hazards. Standardized systematic reviews, Global Health Estimates and Global Burden of Disease envelopes, and United Nations population data informed the evidence base. Missing epidemiological data were imputed using Bayesian hierarchical meta-regression models. Disease models mapped acute and chronic health outcomes, applying updated disability weights, life tables, and probabilistic Monte Carlo calculations to estimate incidence, mortality, Years Lived with Disability, Years of Life Lost and Disability-Adjusted Life Years for all 194 WHO Member States. Transparency and analysis reproducibility were ensured through availed open-source R packages and standardized workflows. Results The computational framework provides annual, country-level estimates with improved internal consistency and an expanded hazard scope compared with the WHO 2015 edition. Advances include refined modelling, enhanced uncertainty propagation, and broader inclusion of microbial, parasitic, and chemical hazards. Persistent data gaps---especially in high-burden regions---were filled through extensive imputation. Conclusions The computational framework for the WHO 2026 edition delivers the most comprehensive and transparent assessment of the global burden of foodborne diseases to date. Despite remaining limitations, it enables routine monitoring, supports evaluation of global food safety efforts, and highlights priorities for strengthening national data systems.

6

Imaging-detected benign breast findings in a forensic autopsy cohort unselected for breast symptoms: descriptive results from the Sisyphus study

Sidiropoulou, Z.; Santos, C.

2026-05-12 radiology and imaging 10.64898/2026.05.07.26352434 medRxiv

Top 0.8%

2.2%

Show abstract

Rationale and ObjectivesPublished estimates of benign breast disease (BBD) are derived mainly from clinical, surgical, screening-recall, or reduction-mammoplasty series. Forensic autopsy cohorts can reduce referral and symptom-selection bias, although they are not necessarily representative of the whole living population. We describe imaging-detected benign breast findings in the Sisyphus forensic autopsy cohort. Materials and MethodsConsecutive medico-legal autopsies of individuals aged 40 years or older were prospectively evaluated over a multi-year period at a medico-legal autopsy service in Portugal. Bilateral breast specimens obtained by subcutaneous modified radical mastectomy were examined with specimen digital mammography and ultrasonography. Findings were classified according to BI-RADS terminology. Lesions requiring tissue diagnosis in the post-mortem protocol underwent wire-guided or direct excisional biopsy. Female cadavers were analysed as the primary cohort; male cadavers were analysed separately as an exploratory subgroup. Proportions are reported with exact 95% confidence intervals (CIs). ResultsThe cohort included 291 cadavers: 217 women and 74 men. Among female breast specimens, 236/434 were BI-RADS 1 (54.4%; 95% CI, 49.6-59.1), 189/434 were BI-RADS 2 (43.5%; 95% CI, 38.8-48.4), and 8/434 were protocol-sampled suspicious findings (1.8%; 95% CI, 0.8-3.6). At the cadaver level, 99/217 women had at least one benign imaging finding (45.6%; 95% CI, 38.9-52.5). Mammographic benign findings were present in 91/217 women (41.9%; 95% CI, 35.3-48.8), dominated by calcifications; ultrasonographic benign findings were present in 51/217 (23.5%; 95% CI, 18.0-29.7), most often simple cysts and duct ectasia. Plasma cell mastitis-pattern calcifications were observed in 8/217 women (3.7%; 95% CI, 1.6-7.1). Male benign findings were less frequent (9/74, 12.2%; 95% CI, 5.7-21.8) and were dominated by benign lymph-node variants. All nine protocol-sampled lesions were benign at histology. Clinical breast examination identified 5/8 protocol-sampled female lesions (62.5%; 95% CI, 24.5-91.5). ConclusionIn this forensic autopsy cohort unselected for breast symptoms, benign imaging findings were common in women aged 40 years or older and less frequent in men. The results provide descriptive post-mortem imaging reference data, but lesion-specific estimates, especially rare entities, should be interpreted with caution because of small numerators, the older age profile, limited clinical history, and the original cancer-focused design of the Sisyphus study.

7

Three-dimensional printing of lifelike PET phantoms

Ge, Y.; Li, E. J.; McDonald, S.; Geagan, M.; Parma, M. J.; Gao, M.; Mei, K.; Pasyar, P.; Im, J. Y.; Muller, F. M.; Pantel, A. R.; Karp, J. S.; Noel, P. B.

2026-05-14 radiology and imaging 10.64898/2026.05.11.26352857 medRxiv

Top 0.9%

1.9%

Show abstract

BackgroundRealistic PET/CT phantoms are essential for system evaluation, protocol optimization, and validation of advanced reconstruction methods. However, existing phantoms are often limited by simplified geometries, spatially uniform activity patterns, and complex preparation procedures. PurposeTo develop and evaluate PixelPrintPET, a 3D printing-based method for fabricating anatomically realistic PET/CT phantoms with spatially heterogeneous radiotracer distributions and a single-solution filling workflow that avoids physical compartmentalization. MethodsPixelPrintPET generates voxel-based printing instructions that encode spatially varying infill, which is realized during printing through modulation of filament extrusion, enabling heterogeneous activity distributions without compartmentalization of radioactivity at different activity concentrations. Calibration phantoms and anatomically structured phantoms were designed and printed using high-flow polylactic acid (PLA), with anatomical inputs derived from either digital atlas-based models or patient imaging data. The printed phantoms were subsequently filled by immersion in a radioactive solution, allowing activity distribution to be controlled by the internal porous structure. A bottom-up filling procedure with reduced surface tension was developed to ensure uniform infiltration and minimize air entrapment. Phantoms were imaged on the PennPET Explorer PET/CT system, and quantitative performance was evaluated using contrast recovery coefficient (CRC), target-to-background ratio (TBR), and comparisons with simulated or patient-derived reference data. ResultsA strong linear relationship between infill ratio and normalized signal (R2 = 0.998) was demonstrated by the calibration phantom, enabling reliable mapping between structure and activity. Additionally, air entrapment was minimized to less than 1% of the total phantom volume. In the contrast recovery phantom, CRC values were consistent with measurements using traditional phantoms. The brain phantom reproduced atlas-derived contrast patterns, with gray-to-white matter differences within 5% after accounting for resolution and other system effects. The patient-based thorax phantom showed high reproducibility across repeated scans, with differences within 3%, and closely matched the input patient image with regional differences within 10% in all regions except the lung. ConclusionsPixelPrintPET enables the fabrication of realistic, reproducible, and versatile PET/CT phantoms with a voxel-level control of the activity distribution. This approach provides a practical solution for generating patient-specific and application-specific phantoms, with the potential to accelerate system validation, protocol development, and clinical translation of advanced PET/CT technologies.

8

Breast cancer is linked to changes in the urinary extracellular vesicle proteome

Laziri, N.; Zainurin, N. A. A.; Bambarandhage, A. U. K. H.; Fatudimu, O. S.; Gate, T.; Tench, H.; Fu, D.; Zhang, X.; Beckmann, M.; Phillips, H.; Pennick, M.; Morphew, R. M.; Mur, L. A.

2026-05-12 genetic and genomic medicine 10.64898/2026.05.08.26352674 medRxiv

Top 0.9%

1.8%

Show abstract

Breast cancer (BC) remains a leading cause of morbidity and mortality worldwide. Early detection remains the most effective strategy for improving prognosis. We explored the urinary extracellular vesicle (uEV) proteome for changes linked to BC which could also be potential biomarkers. Urine samples were collected from 20 participants across four groups (n = 5 each): newly diagnosed BC patients, benign breast disease (BBD) patients, individuals with breast cancer symptoms (symptom control, SC), and age-matched healthy controls (HC). EVs were isolated using size exclusion chromatography and extracted proteins were analysed using a GeLC proteomic approach. Proteins were identified and quantified using Proteome Discoverer and further analysed using MetaboAnalystR, Funrich and Metascape. A total of 256 proteins were identified from the uEV preparations. BC comparisons with BBD, SC and HC identified 7 proteins differentially expressed proteins (DEP); SERPINB1 -- Serpin family B member 1, LCN1 -- Lipocalin 1, SIRPA -- Signal regulatory protein alpha, ACTB -- Actin, beta, YWHAZ --Tryptophan 5-monooxygenase activation protein zeta, Ig JCHAIN and APOA1 -- Apolipoprotein A1. Receiver Operator Characteristic (ROC) curve assessments suggested that each DEP protein had an area under the curve (AUC) of > 0.8. These findings highlight EV-derived proteins as promising non-invasive biomarkers for breast cancer detection, warranting further validation in larger cohorts.

9

Deep Learning for Automated Meningioma Segmentation: Toward Clinical Integration and Workflow Efficiency

Fenney, E.; Muralidharan, L.; Ruffle, J. K.; Pandit, A.; Millip, M.; Hammam, A.; Brookes, T.; Jabeen, F.; Colman, J.; Sarwani, O.; Alattar, K.; Efthymiou, E.; Kallam, N.; Siddiqui, J.; Marcus, H. J.; Nachev, P.; Hyare, H.

2026-05-15 neurology 10.64898/2026.05.12.26352585 medRxiv

Top 1%

1.7%

Show abstract

Background: Meningiomas are the most common primary intracranial tumors in adults, and volumetric assessment increasingly guides surveillance and treatment decisions. Automated segmentation could enable standardized volumetry but requires robust validation. Purpose: To develop a fully automated three-dimensional deep learning model for meningioma segmentation on multiparametric MRI, and to evaluate segmentation accuracy, external generalizability, failure modes, radiologist-rated clinical plausibility, and workflow feasibility. Methods: From 2024 to 2026, this retrospective study trained a custom 3D nnU-Net residual encoder model. Expert segmentations covered enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Dice similarity coefficient (DSC) was the primary metric. External validation used an independent single-institution dataset (n = 310 intracranial cases) with incomplete MRI protocols. Failure modes, model equity, and inference time were assessed. A blinded multi-rater study (10 radiologists; 510 cases) rated TC segmentations using a 0-10 Likert scale, analyzed with linear mixed-effects models. Results: Model training used the BraTS Meningioma 2023 dataset (n = 1000; mean age 60.2 {+/-} 14.5; 705 female). In cross-validation, mean DSC was 0.939 for ET, 0.937 for TC, and 0.921 for WT. In external validation, mean DSC was 0.872 for TC and 0.842 for WT, despite heterogeneous protocols and incomplete sequences. Predicted TC volumes correlated strongly with reference volumes in cross-validation (r = 0.995) and external validation (r = 0.971). Most common failure modes were skull base and intraosseous tumors with performance equitable across demographic subgroups. Mean inference time was 1.2 seconds. In blinded evaluation (1120 ratings), model segmentations received higher scores than reference annotations (+0.32 BraTS; +1.38 external validation). Conclusion: A fully automated deep-learning model achieved high meningioma segmentation accuracy across multi-institutional training data and external clinical imaging. In a blinded study, model segmentation quality exceeded reference annotations, and 1.2-second inference supported workflow integration. Prospective evaluation is warranted before routine deployment.

10

MurineCyto-Det: A High-Resolution Murine BALF Cytology Dataset for Leukocyte Segmentation and Detection

Le, T. X.; Tran, L.-A. T.; Farabi, D. A.; Wang, S.; Phan, A. T. Q.; Cormier, S. A.; Taada, A.; McGrew, D.; Du, Y.; Vu, L. D.

2026-05-12 bioinformatics 10.64898/2026.05.08.723893 medRxiv

Top 1%

1.7%

Show abstract

Automated analysis of murine bronchoalveolar lavage fluid (BALF) cytology is important for preclinical respiratory research, yet progress has been limited by the lack of publicly available, well-annotated mouse BALF image datasets. We present MurineCyto-Det, a high-resolution murine BALF cytology dataset comprising 333 image tiles of size 1024x1024 pixels, annotated across five cytological categories with both pixel-level segmentation masks and one-to-one matched bounding boxes. The dataset contains 14,551 annotated cell instances and supports two complementary analysis tasks: morphology-oriented cell segmentation and object-level cell detection. To establish reproducible benchmark baselines, we evaluated representative segmentation and detection models. The results demonstrate the practical utility of MurineCyto-Det while highlighting realistic challenges arising from class imbalance, small object size, irregular cell morphology, and ambiguous debris-like structures. MurineCyto-Det provides a standardized resource for developing, evaluating, and comparing automated methods for murine BALF cytology analysis. The dataset is publicly available at https://doi.org/10.5281/zenodo.17608677.

11

MicrobeMS - A MATLAB Toolbox for Microbial Identification Based on Mass Spectrometry

Lasch, P.

2026-05-12 bioinformatics 10.64898/2026.05.08.723807 medRxiv

Top 1%

1.7%

Show abstract

1.Over the last two decades, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-ToF MS) has become the standard method for identifying bacteria and has found a wide range of applications, especially in clinical microbiology. The methods high taxonomic resolution, minimal sample preparation, and complete, ready-to-use commercial systems, which include instrumentation, experimental protocols, spectral databases, and identification analysis software, were key factors in the success of MALDI-ToF MS as the standard for identifying microorganisms in routine diagnostic laboratories. However, despite the availability of these commercial solutions, there is also a growing need for efficient, cost-effective, vendor-neutral databases and analysis tools. These tools would enable the compilation of user-defined mass spectral databases and the testing of new analysis methods and algorithms, particularly in an academic context. To this end, MicrobeMS software has been developed to cover all stages of MALDI-ToF MS-based identification analysis. MicrobeMS is an easy-to-use desktop application for analyzing mass spectra from microorganisms and performing tasks related to spectrum database compilation. It includes routines for direct data import and export, biomarker peak searches, management of spectrum metadata, testing of spectrum quality, supervised and unsupervised identification analysis and intuitive result display. MicrobeMS is implemented in MATLAB and is freely available as MATLAB pcode for Windows and Linux, as well as a standalone application. Over the last fifteen years, the software has undergone continuous development and is now used routinely in various settings at the Centre for Biological Threats and Special Pathogens (ZBS) at the Robert Koch Institute (RKI) in Berlin, Germany, for example in supporting spectrum database compilation, to identify special or rare pathogenic bacteria by advanced identification analysis concepts, or to test in silico MALDI-ToF MS databases derived from microbial genomes. In this software publication the versatility and capabilities of MicrobeMS are demonstrated using a test data set from highly pathogenic bacteria (HPB) which has been obtained as part of a published European Union (EU)-funded External Quality Assurance Exercise (EQAE). MicrobeMS and HPB test data can both be downloaded from https://wiki.microbe-ms.com/. The goal of this software publication is twofold: to raise awareness of MicrobeMS within the scientific community and to encourage the testing of the software and custom-developed MALDI-ToF MS databases of the RKI, which are published at the ZENODO data repository (https://doi.org/10.5281/zenodo.7702374).

12

microRNA expression during early development in the coral Acropora digitifera

Grinblat, M.; Fridrich, A.; Cooke, I.; Moran, Y.; Huerlimann, R.; Brunner, R.; Andrade, N.; Ueda, N.; Ball, E.; Miller, D. J.

2026-05-13 developmental biology 10.64898/2026.05.09.724056 medRxiv

Top 1%

1.7%

Show abstract

Acropora spp. are the dominant reef-builders of the Indo-Pacific but are also amongst the most stress-sensitive corals. For these reasons, Acropora spp. have become the most studied of corals, two species (A. digitifera and A. millepora) often essentially serving as the basis for understanding molecular responses and processes across the sub-order Refertina and corals in general. The early development of these species has been well-characterised in terms of morphology and gene expression but as yet we have a limited understanding of how transcription is regulated during development. In "higher" animals (bilaterians) microRNAs (miRNAs) are critical regulators of gene expression but until now their involvement in coral development has not been investigated. Building on the existing developmental data for Acropora spp., we catalogued microRNAs (miRNAs) expressed during the early development of Acropora digitifera and profiled their expression in 21 stages from unfertilised eggs to 24h after treatment with a natural settlement cue (CCA chips). 157 miRNAs were recognised, many of which ([~]60%) were novel. These fell into three distinct groups, corresponding to three distinct developmental phases: (1) those present in eggs through to gastrulation (2) a larvally expressed group and (3) those expressed following settlement induction. Exposure of competent larvae to a natural settlement inducer resulted in major changes in the miRNA profile within 10 minutes, indicating that miRNAs may be particularly important in mediating the larva/polyp transition but are also likely to play important regulatory roles throughout early coral development in addition to possible roles in disease resistance.

13

Benchmarking foundation models for improving confounding control in target trial emulation

Kleper, S. L.; Melamed, R. D.

2026-05-13 epidemiology 10.64898/2026.05.09.26352820 medRxiv

Top 1%

1.5%

Show abstract

Machine learning models for causal inference aim to adjust for confounding factors that are associated with both an exposure and an outcome, creating a spurious biased association. But, these methods are rarely empirically evaluated to assess their success in mitigating such bias. Recent advances in knowledge representation, including both foundation models and knowledge graphs, could enrich these models, but rigorous evaluations are needed in order to assess their potential. Here, we ask whether enriching existing causal inference models with knowledge representations from foundation models can improve confounding control. Rather than using semi-simulated data to address this question, we focus on examples of real confounding: we emulate target randomized active comparator trials that are subject to confounding by indication. Our results can guide researchers aiming to develop or apply methods for discovering causal effects from observational data.

14

AnnotX: An Edge-powered Laparoscopic Video Annotation Platform

Lafouti, M.; Feldman, L. S.; Hooshiar, A.

2026-05-14 medical education 10.64898/2026.05.11.26352930 medRxiv

Top 1%

1.4%

Show abstract

Accurate and objective evaluation of surgical skill and performance is critical for advancing training and improving patient outcomes. Current assessment methods increasingly rely on video analytics and depend on labor-intensive, frame-by-frame manual annotation by experts. In this work we developed a surgical video annotation platform (AnnotX) that used a Python backend running a pretrained promptable video segmentation foundation model, i.e., Segment Anything 3 (SAM 3) for per frame segmentation and temporal segment propagation. With a few interactions per class, the model generated a high-quality mask on a key frame and propagated it through the sequence. The platform automatically exported per-class binary masks and color overlays for every frame, together with deterministic metadata and a standardized study folder structure to support auditability and downstream analysis. On deidentified laparoscopic surgery videos, the system processed typical clips in minutes and reduced expert annotation time from hours to minutes without task-specific fine-tuning. We also benchmarked multiple SAM variants (SAM 2, MedSAM 2, and SAM 3) on the CholecSeg8K dataset, and showed AnnotX with a SAM 3 backbone outperformed alternatives. It exhibited a mean IoU of 0.884 and mean Dice of 0.924 across 101 annotated sequences. By being free, practical, and lightweight to deploy, AnnotX aims to accelerate reproducible surgical dataset creation and provides a step toward scalable, video-based performance evaluation in training and quality-improvement settings.

15

Generating Synthetic MR Perfusion Maps from DWI and FLAIR in Acute Ischemic Stroke: Development and External Validation of a Deep Learning Model

Matsulevits, A.; Koch, A.; Mahe-Verdure, C.; Bendszus, M.; Hilbert, A.; Boullet, M.; Marnat, G.; Mutke, M.; Aydin, O.; Olindo, S.; Sibon, I.; Frey, D.; Thiebaut de Schotten, M.; Tourdias, T.

2026-05-13 neuroscience 10.1101/2025.10.23.684079 medRxiv

Top 2%

1.2%

Show abstract

BackgroundMagnetic resonance imaging (MRI) is critical for acute stroke triage, but time-consuming, and often requires contrast injection for perfusion imaging. This study aimed to synthesize T-map perfusion maps from routinely available, non-contrast DWI and FLAIR using deep generative models. We hypothesized that relevant perfusion information could be inferred from these modalities to streamline imaging and reduce reliance on dynamic susceptibility contrast perfusion. MethodsAcute MRI data from 355 patients with anterior circulation stroke, including dynamic susceptibility contrast perfusion, were retrospectively collected from two European centers (Heidelberg: 2010-2018; Bordeaux: 2021-2022). Six versions of a denoising diffusion probabilistic model (DDPM) and a GAN architecture were trained to generate synthetic T-max perfusion maps from DWI, FLAIR, and infarct core mask as inputs. Performance was assessed by comparing synthetic and ground truth T-max maps using image similarity metrics. Regions with T-max >6s were compared using Dice coefficients, and mismatch volume distributions were analyzed. An ablation study quantified the contribution of each input. ResultsThe best performance was achieved by a DDPM with a 2.5D architecture using DWI, FLAIR, infarct core mask, and a perfusion-weighted loss function. It produced synthetic perfusion T-max maps with high similarity to ground truth under 110 seconds. The model showed strong spatial overlap for T-max >6s regions in internal validation (average Dice = 0.82, SD = 0.08), and external validation average (Dice 0.59, SD = 0.13), respectively. Synthetic maps closely matched ground-truth mismatch distributions, capturing key perfusion patterns. The infarct core mask played a critical role in model performance, alongside DWI and FLAIR inputs. ConclusionsWe propose a non-invasive, scalable framework to generate synthetic T-max perfusion maps from non-contrast MRI. This approach could expand access to perfusion data in acute stroke, shorten imaging protocols, and accelerate treatment decisions by eliminating the need for contrast-enhanced acquisition. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=200 SRC="FIGDIR/small/684079v2_ufig1.gif" ALT="Figure 1"> View larger version (94K): org.highwire.dtl.DTLVardef@164235forg.highwire.dtl.DTLVardef@14e5489org.highwire.dtl.DTLVardef@190214eorg.highwire.dtl.DTLVardef@17a9e3a_HPS_FORMAT_FIGEXP M_FIG C_FIG

16

Reproducibility of Apparent Diffusion Coefficient and Restriction Spectrum Imaging Restriction Score in the Prostate Across MRI Sessions, Vendors, and Acquisition Settings: a Prospective Study

song, y.; Conlin, C. C.; Lee, K.-L.; Dornisch, A.; Barrett, T.; Do, S.; Do, D. D.; Margolis, D. J.; Rakow-Penner, R.; Dale, A.; Liss, M. A.; Seibert, T. M.

2026-05-13 radiology and imaging 10.64898/2026.05.10.26352843 medRxiv

Top 2%

1.2%

Show abstract

BackgroundDiffusion-weighted MRI is central to prostate cancer detection, but apparent diffusion coefficient (ADC) has limited reproducibility across scanners and sites. Restriction Spectrum Imaging restriction score maximum value (RSIrs-max) may provide a more reproducible biomarker. PurposeTo evaluate cross-session reproducibility of within-lesion mean ADC and RSIrs-max on prostate MRI, including same-vendor and cross-vendor comparisons, and in unfavorable-histology prostate cancer (uhPC) and different interpolation settings. Materials and MethodsIn this prospective study, participants with suspected or known prostate cancer enrolled from August 2022 to January 2026 underwent two MRI examinations including an RSI protocol. MRI-visible lesions were contoured on T2-weighted MRI; in participants with multiple lesions, the index lesion was selected. Mean ADC and RSIrs-max were measured within MRI-visible lesions. Analyses included all visible lesions, same-vendor and cross-vendor subgroups, participants with uhPC, and 20 participants with scans reconstructed with and without zero-filled interpolation (a setting with different defaults across vendors). Pearson correlation coefficients with 10,000 bootstrap resamples were used to estimate 95% confidence intervals. ResultsSixty-one male participants (median age, 69 years [IQR, 63-74]) were evaluated; 58 of 61 (95%) had MRI-visible lesions, and 26 of 58 (45%) had uhPC. For all MRI-visible lesions, correlations were 0.55 (95% CI: 0.23-0.76) for mean ADC and 0.83 (95% CI: 0.72-0.90) for RSIrs-max. In same-vendor scans, correlations were 0.76 (95% CI: 0.27-0.95) and 0.88 (95% CI: 0.72-0.96); in cross-vendor scans, they were 0.31 (95% CI: -0.07-0.62) and 0.79 (95% CI: 0.65-0.89), respectively. In uhPC, correlations were 0.42 (95% CI: -0.02-0.83) for mean ADC and 0.90 (95% CI: 0.77-0.96) for RSIrs-max. With inconsistent versus consistent interpolation, RSIrs-max correlation increased from 0.73 (95% CI: 0.48-0.89) to 0.89 (95% CI: 0.78-0.96). ConclusionADC showed limited reproducibility, particularly across vendors. RSIrs-max has stronger between-session reproducibility across same-vendor, cross-vendor, uhPC, and interpolation analyses.

17

Study protocol for preoperative classification using integrated screening and short-course neoadjuvant BRAF/MEK inhibition in newly diagnosed papillary craniopharyngioma (the PRECISE-PCP study): a prospective single-arm study

Ye, Z.; Wu, G.; Jiang, H.; Gu, X.; Huang, R.; Wang, Y.; Qiao, N.; Ma, Z.; Ye, Z.; Wu, Y.; Wang, W.; Cheng, H.; Chen, H.; Ye, H.; Wang, Y.; Zhang, Z.; Guan, M.; Zhao, Y.; Zhang, Q.

2026-05-12 oncology 10.64898/2026.05.08.26351826 medRxiv

Top 2%

0.8%

Show abstract

IntroductionCraniopharyngioma (CP) comprises two distinct histological subtypes, adamantinomatous craniopharyngioma (ACP) and papillary craniopharyngioma (PCP), which are often challenging to distinguish preoperatively. Approximately 95% of PCP harbor the BRAF V600E mutation, whereas ACP lacks this alteration, making PCP uniquely sensitive to BRAF and MEK inhibition. However, in the absence of a reliable preoperative classification strategy, targeted therapy has been limited to recurrent disease or to cases with histological confirmation. This study aims to describe and prospectively evaluate a pragmatic preoperative classification strategy and short-course neoadjuvant BRAF and MEK inhibition followed by surgery in newly diagnosed, preoperatively classified PCP. Methods and analysisThis is a prospective, single-arm, open-label study. Patients with newly diagnosed craniopharyngioma will be screened using an integrated preoperative strategy combining imaging-based prediction and selective cerebrospinal fluid (CSF) cell-free DNA testing for BRAF V600E in indeterminate cases. Twelve participants preoperatively predicted as PCP and BRAF V600E positive will receive dabrafenib 150 mg twice daily plus trametinib 2 mg once daily for up to three 28-day cycles, followed by transnasal endoscopic surgery. Assessments are scheduled at days 7, 14, 28, 56, and 84 until surgery. The primary endpoint is objective response rate, assessed by contrast-enhanced MRI using RANO 2.0 criteria. Secondary outcomes include progression-free survival, local disease control, endocrine outcomes of the hypothalamic-pituitary-adrenal and hypothalamic-pituitary-thyroid axes, visual and cognitive outcomes, postoperative diabetes insipidus, surgical complexity, and concordance between the preoperative classification strategy and postoperative pathology and BRAF V600E status. Exploratory analyses will evaluate treatment-related changes in tumor vascularity, tissue characteristics, and post-treatment molecular alterations in tumor tissue. Ethics and disseminationThis protocol has been approved by the Ethics Committee of Huashan Hospital, Fudan University (KY2024-028). Written informed consent will be obtained from all participants. Results will be disseminated through peer-reviewed publications and scientific conferences. Trial registration numberChiCTR2400081636 STRENGTHS AND LIMITATIONS OF THIS STUDYO_ST_ABSStrengthC_ST_ABS[tpltrtarr] This study proposes an integrated, clinically applicable preoperative strategy that combines imaging-based prediction with selective cerebrospinal fluid cell-free DNA analysis to identify papillary craniopharyngioma (PCP) prior to surgery. [tpltrtarr]It prospectively evaluates short-course neoadjuvant BRAF and MEK inhibition in newly diagnosed PCP, addressing a clinically relevant gap in current management. [tpltrtarr]Standardized, multidimensional assessments are performed across the neoadjuvant, perioperative, and early postoperative periods, capturing radiographic, surgical, endocrine, visual, and cognitive outcomes. Limitation[tpltrtarr] The single-arm, open-label design without a surgical control group limits direct comparison with upfront surgery. [tpltrtarr]Despite the integrated prediction strategy, preoperative misclassification cannot be excluded entirely.

18

Real-time hip biomechanics from smart garments via a physics-informed neural network

Cornish, B. M.; Pizzolato, C.; Saxby, D. J.; Lyons, N. R.; Salchak, Y. A.; Worsey, M. T.; Lloyd, D. G.; Diamond, L. E.

2026-05-17 rehabilitation medicine and physical therapy 10.64898/2026.05.06.26352104 medRxiv

Top 2%

0.8%

Show abstract

Tissue-level mechanical stimuli are primary drivers of tissue adaptation and can be optimised during conservative treatments to improve treatment outcomes for many highly prevalent musculoskeletal conditions. Current laboratory-based technologies limit our ability to connect conservative interventions such as exercise and movement modification with muscle, joint, and tissue-level mechanics, in natural environments. We introduce a physics-informed neural network (PINN) to estimate clinically relevant biomechanics from smart garments. By accounting for physiological dynamics of neural activation and muscle contraction, the PINN accurately predicted hip joint angles (RMSE <6 degrees), moments (RMSE 0.12 N*m/kg to 0.30 N*m/kg), and joint forces (RMSE 6 to 16%) from three inertial measurement units and four electromyographic sensors. We demonstrated that the trained PINN can be combined with a smart garment to estimate hip biomechanics, in real-time, during a gait retraining intervention aimed at modifying joint loading to treat hip osteoarthritis. The developed PINN and smart garment system may be adapted and generalised for personalised management or rehabilitation of a broad range of musculoskeletal diseases and injuries, in clinical, home, workplace, and sporting environments.

19

PheBee: A Graph-Aware System for Scalable, Traceable, and Semantic Phenotyping

Gordon, D. M.; Homilius, M.; Antoniou, A. A.; Grannis, C.; Lammi, G. E.; Herman, A. C.; Kubatko, A.; Chaudhari, B. P.; White, P.

2026-05-13 health informatics 10.64898/2026.05.09.26352812 medRxiv

Top 2%

0.7%

Show abstract

ObjectivesPhenotype-driven workflows in clinical and translational research require standardized ontology-based representation, ontology-aware cohort discovery, and provenance inspection for each assertion. Existing approaches optimize either for semantic traversal or scalable batch analytics, but not both. We describe PheBee, a hybrid system that links semantic assertions to scalable evidence storage via a deterministic identifier, preserving provenance while supporting ontology-aware discovery at cohort scale. Materials and MethodsPheBee represents phenotype assertions in a knowledge graph as ontology-linked nodes with clinical modifier context (e.g., negated, family history), and stores supporting evidence records in a scalable row-oriented evidence table for cohort-scale access. The two layers are connected by a deterministic identifier enabling stable joins across repeated ingestions without duplicating high-volume evidence in the graph. We evaluated PheBee using synthetic datasets designed to exercise end-to-end ingestion and query workflows. ResultsFunctional evaluation validated hierarchical term expansion, qualifier-aware retrieval, duplicate-free assertion handling under re-ingestion, and privacy-conscious management of subjects shared across multiple research projects. At scale (10,000 subjects producing 12M evidence records) PheBee completed ingestion in [~]30 minutes and responded to interactive queries within 6 seconds under concurrent load. DiscussionPheBee exposes a unified API for ontology-aware cohort discovery with hierarchical term expansion, subject-centric retrieval of phenotypes and clinical modifiers, and evidence and provenance queries. Its data model aligns with GA4GH Phenopackets, facilitating interoperability with phenotype exchange standards. ConclusionBy combining ontology-aware semantics with scalable, provenance-bearing evidence storage, PheBee provides a practical open-source foundation for phenotype-driven research workflows that demand both semantic precision and cohort-scale traceability. LAY SUMMARYResearchers often use "phenotypes" (observable clinical features) to describe individual subjects and find groups of similar subjects. Those phenotypes come from many sources and need both standard terminology and clear evidence for why a phenotype has been associated with a subject. PheBee is a software system that stores phenotype assertions in a way that supports both "ontology-aware" searching (for example, finding patients with any subtype of a condition) and scalable storage of supporting evidence across large research cohorts. PheBee uses multiple types of data storage so researchers can perform interactive phenotype searches and also store millions of pieces of supporting evidence. A shared identifier connects the two storage layers, so subjects phenotypes and their supporting evidence remain linked even as new data is added over time. We evaluated PheBee using fully synthetic (non-patient) data to confirm correct query behavior, evidence traceability, and system performance at large scale.

20

NeuVue: A scalable and customizable framework for electron microscopy proofreading

Xenes, D.; Kitchell, L. M.; Rivlin, P. K.; Martinez, H.; Rose, V.; Bishop, C.; Brodsky, R.; Celii, B.; Ellis-Joyce, J.; Luna, D.; Norman-Tenazas, R.; Ramsden, D.; Romero, K.; Villafane-Delgado, M.; Collman, F.; Gray-Roncal, W.; Reimer, J.; Wester, B.

2026-05-12 neuroscience 10.1101/2022.07.18.500521 medRxiv

Top 3%

0.7%

Show abstract

Connectomic reconstruction from large image volumes produces segmentation and synaptic-assignment errors that must be resolved to support downstream analyses. As datasets have grown larger and teams more distributed, proofreading has become a critical operational bottleneck. Workflows for proofreading and error correction have not scaled commensurately with connectomic data production and may not accommodate heterogeneous proofreader expertise and machine-generated candidate edits. New tools are therefore needed to organize, prioritize, and coordinate proofreading at volume scale. Here we present NeuVue, a task-management and prioritization framework that operationalizes proofreading through atomic, auditable tasks for individual and team review, multistage routing across proofreader cohorts, performance and volume-state tracking, and integration with community annotation, visualization, and analysis services. We report the use of NeuVue across two volumetric datasets, supporting scalable proofreading by over forty proofreaders and producing over fifty thousand edits. NeuVue provides a reproducible human-in-the-loop framework for generating, validating, and maintaining large connectomic datasets.