Med — Latest Matching Preprints

1

A liquid biopsy-centered, pan-cancer, open next generation sequencing panel to support clinical decision-making (LION panel)

Feierabend, S.; Künstner, A.; Forster, M.; Helbing, T.; Gebauer, N.; Gemoll, T.; Axt, F.; Nimmagadda, S. C.; Ranganathan, L.; Schwandt, J.; Heber, M.; Szymczak, S.; Hohensee, I.; Fliedner, S. M. J.; Scherer, F.; Oberländer, M.; Derer-Petersen, S.; Busch, H.; von Bubnoff, N.; Dazert, E.

2026-06-08 oncology 10.64898/2026.06.05.26354976 medRxiv

Top 0.1%

7.1%

Show abstract

Cancer treatment has shifted toward personalized therapy based on molecular profiling, particularly in advanced disease. Existing circulating tumor DNA panels are often broad, generating many non-actionable variants and incurring costs that limit routine use in molecular tumor boards. We developed and validated a manufacturer-independent, 109-gene liquid biopsy-centered pan-cancer open next generation sequencing panel (LION panel), combined with an in-house bioinformatic pipeline to support clinical decision-making. A total of 87 samples were analyzed, including 17 reference samples, 21 healthy blood donor controls, and 49 patient samples including nine tumor entities. The LION panel achieved 92% sensitivity and 99% specificity in reference samples, with high concordance to digital droplet PCR (r = 0.99). It detected variant allele frequencies as low as 0.05% (tumor-informed) and 0.5% (tumor-uninformed). Clinical concordance reached 82% with blood-based digital droplet PCR and 75% with whole exome tissue sequencing. In representative cases, variant dynamics correlated with disease progression and revealed additional targetable variants. Overall, the LION panel supports clinical decision-making by enabling identification of targetable variants, disease monitoring, and detection of treatment resistance, particularly when tumor tissue is unavailable.

2

General-purpose large language models can achieve physician-level accuracy in complex medical data extraction

Rajeev, M.; Narayan, A.

2026-06-10 gastroenterology 10.64898/2026.06.06.26354838 medRxiv

Top 0.1%

4.0%

Show abstract

Background: Unstructured data represent about 80% of total electronic health records (EHR) data. Structuring this free text is essential for advancing clinical research, including cohort selection for trials, retrospective studies, and the development of disease registries. While manual chart review (MCR) remains the gold standard for extracting this clinical data, the process is inherently slow, resource-intensive, and susceptible to errors from human fatigue. We evaluated the extraction accuracy, safety, and efficiency of the HeLIX (Hepatology Logic-Integrated Extraction) framework, a Large Language Model (LLM) protocol using Google Gemini 3 Pro, compared to a gold-standard Manual Chart Review (MCR). Methods: A prospective validation study was conducted using 50 high-complexity, simulated hepatology discharge summaries designed to replicate the real-world heterogeneity of EHRs. The HeLIX framework employed a Zero-Shot, Structured Chain-of-Thought (CoT) prompting strategy enforced by a three-layer architecture: Clinical Reasoning Trace, Schema Enforcement, and Evidence Verification. The model extracted 45 distinct clinical variables. Performance was benchmarked against a consensus MCR. Results: Across 2,250 evaluated data points, the model achieved an overall Extraction Accuracy of 99.24% (95% CI: 98.8%-99.5%), with perfect concordance in 35/45 (77.8%) variables. For binary diagnostic variables, the model demonstrated an overall F1-score of 0.98, Recall of 0.99 and substantial inter-rater reliability (Cohens {kappa} = 0.97). Hallucinations were exceptionally rare (2/2250; 0.08%). Critical errors affecting clinical management occurred in only 2 instances (<0.1% of total data), both involving etiological misattribution in complex multifactorial diagnoses. The AI workflow was 13.4-fold faster and 95.1% more cost-effective than manual extraction. Conclusion: The HeLIX framework demonstrates physician-level accuracy and reliability in extracting complex hepatology data. It offers a scalable, efficient, and economical alternative to manual chart review. Such frameworks could accelerate clinical research, enabling healthcare systems globally to build comprehensive patient registries for a fraction of the traditional cost.

3

Multi-region sampling of the human small intestine using an ingestible device

Fu, B.; DeSchepper, L. B.; Sun, J.; McKeithen-Mead, S. A.; Kapili, B.; Ochoa-Andersen, P.; Spencer, S. P.; Fardeen, T.; Ricardo, M.; El Kamari, V.; Sinha, S.; Relman, D. A.; Grembi, J. A.; Shalon, D.; Estrela, S.; Huang, K. C.

2026-06-10 gastroenterology 10.64898/2026.06.09.26353912 medRxiv

Top 0.1%

3.7%

Show abstract

The human small intestine (SI) plays a central role in nutrient processing, host-microbe interactions, and immune regulation, yet remains poorly characterized due to the lack of minimally disruptive sampling methods. Here, we present a protocol for deploying, recovering, and analyzing samples collected using an ingestible device that enables multi-region, lumen-targeted SI sampling during normal digestion. The device incorporates a ~30-cm collapsible tube wound into pH- or time-responsive layers that sequentially unfurl in situ, typically capturing three spatially ordered samples with high yield and reliable retrieval. This protocol outlines study design, participant handling, device recovery, contamination control, and standardized workflows for analyses, including cell quantification, culturomics, sequencing, and metabolomics. We further describe benchmarking approaches for evaluating spatial resolution and strategies for assay prioritization when sample volume is limiting. By reducing participant burden and facilitating integration with stool, saliva, and clinical metadata, this approach enables longitudinal and large-cohort studies linking SI microbial ecology and host physiology to human health.

4

Genosolver: Rare Disease Diagnosis through Holistic Integration of Unstructured Clinical Narratives Using Large Language and Reasoning Models

Islam, T.; Danner, M.; Ziad, Z.; Begemann, M.; Beijer, D.; Lischka, A.; Lausberg, E.; Mattern, L.; Suh, J.; Wittig, P.; Guezel, N.; Schlaich, E.; Karaivanova, R.; D'Augello, S.; Franken, L.; Ruedebusch, J.; Mueller, R.; Perchalla, E.; Zempel, H.; Haag, N.; Eggermann, K.; Eggermann, T.; Meyer, R.; Kraft, F.; Elbracht, M.; Kurth, I.; Krause, J.

2026-06-05 health informatics 10.64898/2026.06.04.26354845 medRxiv

Top 0.1%

3.6%

Show abstract

Background: Molecular medicine has made genetic diagnostics crucial for rare diseases, but the majority of patients remains without diagnosis even after state-of-the-art assessment. Standardized systems for integrating clinical features, such as the Human Phenotype Ontology (HPO), offer assistance, but are often insufficiently detailed and fail to capture crucial clinical parameters such as age at onset, longitudinal changes in symptoms, detailed characteristics of a clinical symptom, or the absence of a feature. Results: We present Genosolver an integrated workflow that utilizes machine learning to address this bottleneck. Using Large Language Models (LLMs) and Large Reasoning Models (LRMs) on unstructured clinical notes and electronic health care data, we generate a workflow that unifies phenotype extraction, generates differential diagnosis, and prioritizes genetic variants from genome data. We evaluated the performance on 233 previously genetically solved cases, where Genosolver ranked the causative gene first in 72% of cases and in 94% of cases in the top 10 gene list, outperforming the existing benchmarking tool Exomiser by 9%. Semi-automated reanalysis of 1,875 unsolved rare disease cases yielded an additional diagnostic rate of 1.7%. Incorporating rich, unstandardized clinical narratives substantially enhanced model performance beyond HPO-only inputs and demonstrated competitive results using data security compliant local models. Conclusion: Integrating unstandardized clinical data with local LLMs and reasoning offers a scalable, data-secure workflow that increases molecular diagnoses in rare diseases.

5

Metatranscriptomics-Derived Disease Risk Scores as a Preventive, Diagnostic, and Treatment Support Tool

Hu, L.; Bass, M.; Patridge, E.; Molusky, M.; Antoine, G.; Vuyisich, M.; Banavar, G.

2026-06-06 genetic and genomic medicine 10.64898/2026.05.29.26354333 medRxiv

Top 0.1%

2.1%

Show abstract

Background: Chronic diseases and symptom syndromes often develop after prolonged biological changes that may precede formal diagnosis. RNA-based metatranscriptomics captures active microbial and human gene expression and may provide a functional layer for disease risk evaluation. To address this translational gap, we developed and validated a Disease Risk Score (DRS) framework that integrates metatranscriptome-derived pathway activity scores from stool, saliva, and blood samples, and evaluated its potential clinical utility as an adjunct risk-evaluation tool. Methods: DRS uses disease-specific sets of pathway activity scores derived from stool and saliva microbial functions, stool and saliva microbial taxa, and blood human gene expression. For each disease, 'not optimal' pathway scores are aggregated into a normalized cumulative odds ratio, or cOR, using score-level odds ratios, statistical significance, and literature-supported biological relevance derived from a Development Cohort of 22,369 individuals. A cOR [≥] 5 is defined as high risk. Performance is evaluated in an independent Validation Cohort of 15,908 individuals using self-reported diseases as the reference. Disease support requires both significant cOR separation between self-reported and not-reported (Cohen's d [≥] 0.2) and risk ratio enrichment of self-reported disease among individuals classified as high risk (95% CI of Risk Ratio > 1). Results: Of 20 initially evaluated diseases, 15 meet the prespecified validation criteria on the independent validation cohort: ADHD, anxiety, chronic fatigue syndrome, depression, GERD, hypertension, inflammatory bowel disease, IBS-C, IBS-D, insomnia, MASLD, obesity, obstructive sleep apnea, Sjogren's syndrome, and type 2 diabetes. Five selected clinical scenarios illustrate how DRS can support clinician-mediated decision making, including IBS subtype reclassification, improved diagnostic acceptance in IBS-D, personalized lifestyle counseling in MASLD and early type 2 diabetes, and diagnostic uncertainty in atypical GERD. Conclusions: DRS is a metatranscriptomics-based risk-stratification framework that aggregates active microbial and human pathway signals into interpretable disease-specific risk estimates across a wide range of disease conditions. Validation against self-reported disease labels in an independent cohort shows significant risk enrichment for each of 15 diseases. DRS is intended as an adjunct to clinical evaluation: a decision support tool in situations where routine care encounters uncertainty, delay, or low patient engagement. Future prospective studies using clinically adjudicated endpoints are needed to assess calibration and clinical outcomes.

6

Development of a Novel Blood-Based Assay for Brain-Derived Tau and Its Validation in Traumatic Brain Injury

Balogun, W. G.; Zeng, X.; Nafash, M. N.; Sehrawat, A.; Shi, R.; Svirsky, S. E.; Okonkwo, D. O.; Puccio, A. M.; Karikari, T. K.

2026-06-10 neurology 10.64898/2026.06.05.26354965 medRxiv

Top 0.3%

1.7%

Show abstract

Brain-derived tau (BD-tau) is an emerging blood-based biomarker for neurodegeneration, yet there are currently limited well validated BD-tau assays available for research and clinical use. To enhance access to this vital biomarker for neurological disorders including traumatic brain injury (TBI), we developed a novel blood-based immunoassay for BD-tau on the ultra-sensitive Quanterix HD-X platform using Single Molecule Array technology. Analytical validation assessed dilution linearity, specificity, precision, detection limits, and spike recovery, each recording robust metrics in agreement with international expert recommendations. The assay demonstrated robust validation metrics, achieving between-run stability of 95% when analyzing aliquots from six independent plasma and serum samples across five analytical runs. It also showed strong dilution linearity when diluted four-fold and achieved over 90% recovery when spiked with cerebrospinal fluid. Next, we evaluated the clinical utility of the assay in cohorts of individuals with traumatic brain injury (TBI), where strong performances were recorded whether using the 2-step or 3-step assay formats ({rho}= 0.94; p < 0.0001). Furthermore, plasma BD-tau distinguished samples from TBI patients based on time from injury and severity (AUC=0.93). Plasma BD-tau differentiated between favorable and unfavorable functional outcomes in the acute-severe group. Our findings underscore the significant potential of the BD-tau assay as a biomarker for TBI in the severe phase.

7

The Multimodal Anonymizer: a fully local multi-agent AI system for medical data deidentification

Hirsch, A.; Ten, F. W.; Krueger, K. S.; Geyer, R.; Roeschl, T.; Groeschel, M.; Rostin, P.; Eils, R.; Spott, M.; Prasser, F.; Meyer, A.; Madrid, J.

2026-06-05 health informatics 10.64898/2026.05.28.26353952 medRxiv

Top 0.3%

1.5%

Show abstract

Background: Safe reuse of multimodal hospital data for AI development is limited by the absence of reliable, context-aware deidentification across multimodal data and longitudinal patient data. Existing approaches are largely modality-specific and can indiscriminately remove clinically important information. Methods: We developed the Multimodal Anonymizer, a modular, locally deployable multi-agent framework integrating multimodal large language models, task-specific neural networks and rule-based transformations. We evaluated 16 orchestrator model configurations on a benchmark built from publicly available data and hospital data from our institution. The benchmark dataset included data from different origins: 250 MIMIC-IV patients with synthetically injected personally identifiable information (PII) supplemented with head CT, face images, handwriting, audio, German clinical-text datasets and local data. Primary outcomes were deidentification sensitivity and preservation of clinically important content; secondary analyses examined model characteristics, reproducibility, and performance against leading market and open-source solutions. Results: The best local configuration (the orchestrator being Qwen3-VL-235B-A22B-Thinking) achieved near-complete deidentification across all datasets, with per-patient sensitivity of 98.80% (95%-CI 97.20; 100), and per-PII sensitivity of 99.82% (95%-CI 99.76; 99.88). Critical clinical preservation was 99.60% (95%-CI 98.80; 100) per-patient, and clinical preservation was 99.61% (95%-CI 99.51; 99.71) per-file. All modalities achieved at least 98.30% sensitivity (lower bound 95%-CI). On our local data, the system achieved a deidentification sensitivity of 100% per-patient and per-PII; and a critical clinical preservation of 100% per-patient as well as a clinical preservation of 99.97% (95%-CI 99.91; 100) per-file. When comparing orchestrators, the leading local models were similar to proprietary models (GPT-5.2) in deidentification sensitivity while showing higher deidentification specificity. The Multimodal Anonymizer outperformed previous tools on most modalities. Conclusion: Near-complete, utility-preserving deidentification of multimodal clinical data is achievable with a unified, locally deployable multi-agent system, enabling safer large-scale reuse of hospital data for research and AI development.

8

Prediction of immunotherapy response using live tumor fragments from routine clinical biopsies

Braun, D.; Dana, N.; Hernan, H. R.; Sahni, S.; Scribano, C.; Johnson, C.; Vedder, L.; von Euw, E.; Zweng, J.; Wargowski, E.; Sunil, A.; Sharma, D.; Routh, J.; Rexroad, K.; McDonnell, P.; Jergens, V.; Costa, C.; Zuniga, R.; Toia, G. V.; Patel, P. M.; Martin, R. C. G.; Majeed, U.; Mukhopadhyay, D.; Lou, Y.; Kokabi, N.; Jakub, J. W.; Hays, D.; Godwin, A. K.; Giffi, V.; Gelbard, A.; Friedl, A.; Duimstra, E. K.; Dronca, R. S.; Chen, R.; Chalfin, H.; Broome, B.; Babiker, H. M.; Chandra, T.; Caenepeel, S.; Hrycyniak, L. C. F.; Sood, C.; Ramos, H.; Patel, P.; Advani, P.; Gierman, H. J.; Taube, J.

2026-06-10 oncology 10.64898/2026.06.05.26354635 medRxiv

Top 0.5%

1.1%

Show abstract

Functional ex vivo assays using live tumor tissues have demonstrated strong predictive accuracy for response to immune checkpoint inhibitors (ICIs) but are not scalable, requiring manual processing of large resections collected at academic centers. Here, an ex vivo live tumor fragment (LTF) platform was developed using standard-of-care biopsies from 228 patients with suspected malignancy collected across prospective, multicenter observational trials and biobanks. Hierarchical clustering of ICI-mediated changes in cytokine production identified two groups: responders and nonresponders. A binary classifier (elive index) using 8 cytokines achieved an AUC of 0.99 for cluster prediction. elive index correctly predicted clinical benefit in 93% (26/28) of patients (P = 3.2x10-5) and accurately identified 83% (10/12) of objective responders. Critically, elive responders were identified among biomarker-negative patients, highlighting the platform as a scalable approach that complements existing companion diagnostics and expands the population of patients identified to benefit from ICI therapy.

9

Heterozygous MMACHC burden variants are associated with higher circulating vitamin B12 in the All of Us Research Program

Cai, L.; DeBerardinis, R. J.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354855 medRxiv

Top 0.6%

0.9%

Show abstract

Heterozygous carriers of autosomal recessive disease variants are conventionally considered unaffected, yet population-scale genomic datasets reveal subclinical carrier phenotypes. MMACHC encodes a cobalamin-processing protein whose biallelic loss causes cobalamin C deficiency, an inborn error of intracellular cobalamin metabolism. We performed an unbiased quantitative phenome-wide association screen in All of Us Research Program v8 to identify phenotypes associated with rare heterozygous MMACHC burden variants. Serum/plasma vitamin B12 was the top quantitative association. Carriers had higher circulating B12 than non-carriers in adjusted analyses, but also higher homocysteine, suggesting that elevated circulating B12 does not reflect improved intracellular cobalamin function. Carriers were less likely to fall below conventional B12 insufficiency thresholds, indicating a potential diagnostic blind spot. A pathway-wide rare-variant gene-burden (All-by-All) gene-burden analysis placed this finding in broader biological context. Burdens in genes related to circulating B12 binding or intestinal absorption were associated with lower circulating B12. In contrast, burdens in several genes involved in cellular delivery and intracellular cobalamin handling were associated with higher circulating B12. This step-specific directionality supports a model in which elevated circulating B12 can reflect impaired cellular handling and consequent systemic accumulation rather than improved cellular cobalamin availability. Because EHR-derived B12 is shaped by heterogeneous clinical and medication contexts, prospective carrier-enriched studies with standardized methylmalonic acid, homocysteine, diet, supplement, medication, comorbidity, and symptom ascertainment are needed to evaluate functional-marker-based screening.

10

Three-Month Observational Data for the MPS IIIB Sentinel Subject Following AAV9 Mediated Gene Therapy

Ma, X.; Gu, R.; Ma, W.; Xu, Q.; Wang, R.; Wang, W.; Liang, M.; Liu, X.; Yang, X.; Zhuang, L.; Zhang, W.; Zeng, X.; Xu, J.; Xu, X.; Wu, Z.; Xia, Y.; Liu, Y.; Zhou, J.; Zhu, X.; Wang, H.; Dong, Z.; Yang, W.; Dai, Y.; Pan, X.; Li, X.; Wang, Y.; Dong, X.; Wu, X.; Feng, Z.

2026-06-09 neurology 10.64898/2026.06.01.26354386 medRxiv

Top 0.7%

0.8%

Show abstract

Background: Mucopolysaccharidosis type IIIB (MPS IIIB) is a devastating neurodegenerative lysosomal storage disorder caused by alpha-N-acetylglucosaminidase (NAGLU) deficiency. There is currently no approved therapy. We report the 3-month outcomes of a novel intracerebroventricular (ICV) gene therapy in a child with MPS IIIB. Methods: In an open-label, single-center, investigator-initiated trial (ChiCTR2600121466), a single dose of RDGT-101 (2.0E14; vg of an AAV9 vector encoding human NAGLU) was administered via ICV infusion. Primary outcomes were safety and tolerability. Secondary outcomes included serum NAGLU activity, urinary heparan sulfate (HS) excretion, and neurocognitive function. Exploratory analyses included hematological parameters. Results: The patient achieved serum NAGLU activity (17.06 nmol/mL/hour) approaching that of healthy controls (17.75 {+/-} 1.37 nmol/mL/hour) by Month 3, accompanied by a 58.4% reduction in urinary HS. Clinically, previously severe hand and toe contractures resolved, allowing for full extension. Neurocognitive improvements were observed, including clear articulation, logical conversation, and sustained eye contact. Hematological analyses revealed normalized red blood cell indices and improved iron utilization. No dose-limiting toxicities, serious adverse events, or clinically significant laboratory abnormalities were observed. Conclusions: A single ICV infusion of RDGT-101 was safe and well-tolerated in this patient with MPS IIIB. Early biochemical correction was accompanied by marked improvements in somatic, neurocognitive, and hematological parameters. These findings support further investigation of ICV AAV9 gene therapy for MPS IIIB.

11

A Comparison of Manual and Automated Approaches to Developing Computable Algorithms for Identifying Acute Pancreatitis

Bann, M. A.; Carrell, D. S.; Gruber, S.; Heagerty, P. J.; Williamson, B. D.; Nelson, J. C.; Hazlehurst, B.; Felcher, A.; Nyongesa, D. B.; Slaughter, M. T.; Sapp, D. S.; Cronkite, D. J.; Ball, R.; Floyd, J. S.

2026-06-08 health informatics 10.64898/2026.06.05.26354934 medRxiv

Top 0.7%

0.8%

Show abstract

Objective: Clinical phenotyping methods that rely on clinical and informatics expertise can be time-intensive and costly. We tested both manual and highly automated approaches using electronic health record (EHR) data to identify an FDA Sentinel Initiative health outcome of interest, acute pancreatitis. Materials and Methods: We trained and evaluated machine learning algorithms using EHR data with two approaches: a custom approach that included manually curated features and trained on outcomes data validated with medical record review, and a highly automated approach that greatly simplifies and automates feature engineering and relies on low-cost silver-standard outcomes for model training. Results: Custom algorithms using manually curated structured claims data discriminated cases from non-cases with a high degree of accuracy (cv-AUC 0.89 [95%CI 0.84-0.94]); the inclusion of natural language processing (NLP)-derived covariates from clinical notes increased performance slightly (cv-AUC 0.91[95%CI 0.86-0.97]). The automated algorithm trained on the outcome count of diagnosis codes performed less well (AUC 0.80 [95% CI 0.75-0.85]) but improved using maximum lipase value as an outcome (AUC 0.88 [95% CI 0.84-0.92]). At a positive predictive value of 90%, the custom algorithm had a sensitivity of 92%, the automated algorithm trained on diagnosis code count had a sensitivity of 45%, and the automated algorithm trained on maximum lipase value had a sensitivity of 84%. However, a prediction rule derived by clinicians during chart review was nearly as accurate (maximum lipase value [≥] 3 times upper limit of normal; AUC 0.86, PPV 85%, sensitivity 92%). Discussion: Machine learning algorithms with manually curated structured data and NLP features trained on validated outcomes data successfully identified validated events. Use of an outcome in the automated model based on specific phenotype knowledge (maximum lipase value) allowed for performance similar to the custom model and with considerably less resources.

12

Closing the Paediatric Gap: Adult-Trained AI Generalises Robustly to Paediatric Coeliac Disease Diagnosis

Jaeckle, F.; Gillett, P. M.; Kirkwood, K. J.; Natu, S.; Chan, J. Y. H.; Bateman, A. C.; Arends, M. J.; Soilleux, E. J.

2026-06-05 pathology 10.64898/2026.06.04.26354889 medRxiv

Top 0.8%

0.8%

Show abstract

Background Coeliac disease (CD) diagnosis on duodenal biopsies is limited by interobserver variability. We have previously demonstrated pathologist-level performance with our artificial intelligence (AI) model for the histopathological diagnosis of adult CD, but not in paediatric practice. As paediatric CD screening programmes expand internationally, accurate and scalable diagnostic tools are needed. We investigated whether an AI model trained exclusively on adult whole-slide images (WSIs) can generalise to paediatric CD diagnosis across independent centres. Methods A training and validation dataset of 9,958 WSIs from 8,421 adult patients (961 CD) from five centres was used to develop an ensemble of multiple-instance learning models using features from a foundation model. Testing was performed on 708 consecutive paediatric patients (86 CD) from two centres (Edinburgh and Southampton) not included in training. Model calibration was assessed, and probability outputs were grouped into clinically interpretable categories. Findings In adult cross-validation, the AI model achieved an area under the receiver operating characteristic curve (AUC) of 98.7%, sensitivity of 84.9%, specificity of 99.0%, and negative predictive value (NPV) of 98.1%. On testing (paediatric) datasets, performance remained high (AUC 98.8%, sensitivity 80.2%, specificity 98.4%, NPV 97.3%). Restricting analysis to predictions outside the intermediate-probability range (predicted CD probability <10% or [≥]65%; 85.3% of cases) improved sensitivity to 100% and specificity to 98.7%. No misclassifications were observed among high-confidence predictions (<2% or [≥]85%; 66.0% of cases). The expected calibration error was 0.03. Performance improved significantly when biopsies from both duodenal sites (bulb [D1] and descending [D2/3]) were considered. Interpretation Our AI model, trained on adult biopsies, generalises to paediatric CD diagnosis across centres and scanner platforms. Well-calibrated probability outputs provide clinically interpretable measures of diagnostic confidence and could support safe identification of CD-negative biopsies within defined thresholds. These findings demonstrate the feasibility of applying adult-derived AI models in paediatric populations and reinforce the importance of multi-site (D1 & D2) biopsy sampling.

13

Topological Deep Learning Identifies Polygenic Variant Clusters Across Familial Multimorbid Disorders

Vomo-Donfack, K. L.; Bousquet, G.; Falgarone, G.; Ginot, G.; Morilla, I.

2026-06-09 health informatics 10.64898/2026.06.03.26354242 medRxiv

Top 0.8%

0.8%

Show abstract

Whole-genome sequencing comprehensively captures coding, non-coding and structural variation in families with suspected inherited disorders, yet its clinical utility remains constrained by an interpretation bottleneck: selecting a handful of relevant variants from millions of candidates. Current rule-based pipelines, anchored in ACMG/AMP criteria, excel at identifying highly penetrant Mendelian alleles but frequently miss variants of low-to-moderate penetrance, non-coding alterations and germline-somatic interactions. Here we introduce PolyCLIP-T, a topology-guided multimodal framework that transforms variant selection from a classification problem into a geometric discovery task. By contrastively aligning DNA-sequence embeddings with functional annotations, PolyCLIP-T constructs a unified latent space in which the displacement between reference and alternate embeddings quantifies the molecular perturbation induced by each variant. Persistent homology then identifies stable topological components - coherent variant groups shared among affected relatives - that transcend single-variant scoring logic. Applied to six families with multi-morbid cancer, autoimmune and cardiovascular disease, PolyCLIP-T recovered non-coding and structural candidates overlooked by conventional pipelines and revealed pleiotropic networks spanning disease categories. This approach provides an interpretable, scalable solution for genome-first investigations of disorders driven by polygenic architectures that evade single-variant analysis. The framework was developed and benchmarked on deeply characterised familial cohorts selected for transgenerational multimorbidity; validation in larger, independent populations will be essential to establish its generalisability. An interactive web tool is freely available at https://www.polyclip-t.uma.es/.

14

Multiplexed temporal SWCNT biosensor combined with convolutional autoencoding identifies ALS-specific serum protein corona signatures

Sirtori, R.; Lopez, R. M.; Li, H.; Liu, C.; Fisk, N.; Roxbury, D. E.; Fallini, C.

2026-06-08 neurology 10.64898/2026.06.08.26354966 medRxiv

Top 0.8%

0.7%

Show abstract

Amyotrophic lateral sclerosis (ALS) lacks a validated blood-based diagnostic, and the field is increasingly moving from single-molecule markers toward integrative, multi-component signatures. Here we present a liquid-biopsy strategy that transduces disease dependent serum-nanoparticle interactions into a learnable near-infrared spectral phenotype. A sensor array of twelve DNA-functionalized single-walled carbon nanotube (SWCNT) chiralities, functionalized with (GT)6 ssDNA coupled with a deep learning model was tested on serum from 20 ALS patients and 19 age- and sex-matched controls (n = 39, TargetALS). Our multiplexed sensor design (12 SWCNT chiralities) and data acquisition strategy based on excitation-emission matrices acquired at three timepoints (0, 6, 24 h) was conceived to maximize sensor carried information. Indeed, we show that the array generates partially independent temporal dynamics across chiralities governed primarily by tube diameter. To decode this multiplexed, time-resolved signal, we trained a dual-objective convolutional autoencoder that jointly optimizes reconstruction and classification, achieving 84.6% cross-validated accuracy (AUC = 0.87). Selected latent features were reproducible across an independent same-subject experimental batch and correlated with serum neurofilament light chain, linking the spectral phenotype to a clinically relevant neurodegeneration marker. Mass spectrometry supported a molecular basis for discrimination, revealing an ALS-biased protein corona enriched in adaptive-immune and inflammatory proteins. Together, these results establish proof of principle that time-resolved, multi-chirality SWCNT spectral sensing can compress complex serum composition into a reproducible near-infrared biomarker signature for ALS.

15

Spermidine suppresses glial inflammation and parkinsonian abnormalities in ATP13A2 deficiency

Cascalho, A.; Sati, A.; Dhondt, H.; Schoonvliet, N.; Kaempf, N.; Coccia, E.; Mamalaki, A.; Behrens, M. I.; Brüggemann, N.; Glatzel, M.; Baekelandt, V.; Klein, C.; Eggermont, J.; Verstreken, P.; Blanchard, J.; Vangheluwe, P.

2026-06-04 neurology 10.64898/2026.05.23.26353575 medRxiv

Top 0.9%

0.7%

Show abstract

Pathogenic variants in ATP13A2, which encodes an endolysosomal polyamine exporter, cause Kufor-Rakeb syndrome and are associated with early-onset parkinsonism and related neurodegenerative disorders, however, the mechanisms by which ATP13A2 dysfunction drives disease remain incompletely defined. In Atp13a2 knockout mice, we identified an early, transient reduction in brain polyamines that precedes overt gliosis and behavioural abnormalities. Pharmacological polyamine depletion exacerbates phenotypes, whereas oral supplementation of spermidine, but not spermine, rescues parkinsonian symptoms establishing metabolic polyamine deficiency as a pathogenic driver. Mechanistically, spermidine counteracts microglia lysosomal dysfunction in the brain and exerts mitochondrial antioxidant and anti-inflammatory effects in primary mouse microglia, thereby improving neuronal integrity. In the absence of Atp13a2, microglial spermidine import relies on the related polyamine transporter Atp13a3. Importantly, these findings translate to human systems, whereby spermidine attenuates inflammation in ATP13A2-deficient human differentiated microglia, while postmortem ATP13A2-deficient brain analysis confirms increased microglia reactivity. Spermidine also rescues motor deficits and dopaminergic neuron loss in ATP13A2-deficient Drosophila and other fly parkinsonism models. Together, these findings identify early polyamine dysregulation as a mechanistic contributor to ATP13A2-associated parkinsonism and nominate spermidine supplementation as a potential therapeutic strategy for ATP13A2-driven pathology and possibly a broader range of parkinsonian sub-types.

16

Exploratory dried blood spot metabolomics identifies pathway-level convergence with ME/CFS biology in a self-reported PEM-like fatigue phenotype

Hauguel, P.; Anctil, N.; Noel, L.-P.

2026-06-10 rheumatology 10.64898/2026.06.08.26355197 medRxiv

Top 0.9%

0.7%

Show abstract

Background. Plasma and serum metabolomic studies of myalgic encephalomyelitis / chronic fatigue syndrome (ME/CFS) have repeatedly implicated hypometabolic, lipid, mitochondrial, redox and tryptophan-kynurenine pathways, but prior cohorts have been modest in size and have used heterogeneous case definitions. Whether similar pathway-level signals are detectable at scale in dried blood spots (DBS), across questionnaire-derived fatigue constructs and across orthogonal LC gradients in the same individuals remains unresolved. Methods. We profiled DBS extracts from 1,784 community-cohort adults by reverse-phase LC-MS using paired 5 min and 15 min gradients. Six questionnaire-derived endpoints captured a pragmatic self-reported PEM-like phenotype, a DSQ-derived PEM-like construct, high or review clinical status, temporal fatigue state, comorbid fatigue and self-reported chronic fatigue. The locked primary endpoint for Phase 1 was pragmatic_fatigue_pem with 226 cases and 914 controls after excluding major metabolic comorbidity. We tested a biology-first panel comprising 22 literature-curated metabolites represented by four participant-level descriptors each, and evaluated three discovery extensions: a targeted m/z search of additional literature candidates, a hypothesis-free univariate screen across 4,553 5 min and 5,625 15 min consensus features, and pairwise z-difference ratios. Endpoint-specific Ridge classifiers were evaluated by five-fold out-of-fold AUC with bootstrap stability filtering. Cross-gradient agreement was assessed by per-metabolite AUC concordance between paired 5 min and 15 min profiles. Severity was modelled as an ordinal grade derived from the number of fatigue criteria met and chronic-fatigue-form status. Results. The biology-first DBS panel achieved out-of-fold AUC 0.81 for the pragmatic self-reported PEM-like endpoint (226 cases / 914 controls). The DSQ-derived PEM-like construct reached AUC 0.60 (57 cases / 201 controls) on the un-filtered set and AUC 0.778 (SD 0.013, twenty seeds) in a post-hoc signature-decomposition follow-up restricted to participants without a self-declared major-metabolic-history tag (29 cases / 230 controls); both are treated as construct-validity anchors rather than as provoked or clinically adjudicated PEM. An optimised operationalisation of the same construct (panel-self normalisation, restriction to non-comorbid participants and demographic covariates) reached AUC 0.71 (95 % CI 0.55 to 0.76), and an exploratory age-stratified signature decomposition suggested age-dependent pathway composition that requires confirmation given small per-stratum case counts. Stable contributors mapped to carnitine-shuttle, TCA-cycle, redox-thiol and tryptophan-kynurenine pathways. Cross-gradient analysis of 22 matched metabolites yielded Pearson r = 0.62 for signed univariate effects (p = 0.002; 68 % directional agreement). The metabolomic score increased with severity grade (Spearman rho = 0.45, p = 4 x 10^-91; median scores 0.24, 0.51 and 0.75 across grades 0, 1 and 2). Sensitivity analyses on the covariate-complete subset (n = 565; 138 cases / 427 controls) showed that the DBS signal was robust to adjustment for age, sex, BMI and medication burden (DBS-only AUC 0.76, DBS plus covariates 0.78, covariates only 0.64), and produced a metabolomic-specific lift of approximately 0.13 AUC over the strongest anti-leak declarative cross-form questionnaire baseline (AUC 0.63). DBS-only AUC was stable across sex, age and BMI subgroups, and a 1:4 nearest-neighbour matched analysis on age, sex and BMI yielded AUC 0.72 (95 % CI 0.67 to 0.77). The observed pattern supported pathway-level convergence with prior ME/CFS metabolomics literature, including carnitine shuttle, fatty-acid beta-oxidation, TCA cycle, redox-thiol, urea cycle, glycerophospholipid and tryptophan-kynurenine axes. In contrast, the hypothesis-free 15 min screen produced high-AUC features that mapped predominantly to environmental or technical signals, including pesticide, industrial-amine and mobile-phase artifact annotations; only one of eight top leads, a truncated oxidised phospholipid, was biologically plausible, and none had tandem-MS support. Conclusions. In this large community cohort, a literature-curated DBS metabolomic panel captured pathway-level biology associated with a questionnaire-derived PEM-like fatigue phenotype, showed directional concordance across LC gradients, scaled with symptom severity and remained robust to key demographic, anthropometric and anti-leak questionnaire baselines. The findings converge with several metabolic axes previously reported in ME/CFS plasma and serum studies, including carnitine-shuttle, TCA-cycle, redox-thiol, urea-cycle, glycerophospholipid and tryptophan-kynurenine pathways. They should not be interpreted as clinical validation of a diagnostic test, screening tool or objective provoked-PEM biomarker. Rather, they support at-home-compatible DBS metabolomics as a biologically grounded platform for future clinically adjudicated validation, decision-support development and longitudinal monitoring in fatigue and PEM-like syndromes. Because DBS contains cellular and plasma-derived components, matrix effects must be considered when comparing individual metabolites with venous plasma or serum studies, and hypothesis-free screening at this scale can preferentially surface exposome or technical variance unless molecular identification is enforced before biological interpretation.

17

Polypore Mushroom Mycelia for Treatment of Active COVID-19 Infection: A Randomized Clinical Trial

Saxe, G.; Shubov, A.; Smith, C. N.; Golshan, S.; Shekhtman, T.; Wilson, S.; Slater, D.; Bair, Z. J.; Beathard, C.; Davis, R. A.; MacElhern, L.; Kao, L. K.; Senowitz, P.; Gosnell, N.; Buchholz, D.; Aguilar-Carreno, H.

2026-06-09 infectious diseases 10.64898/2026.06.01.26354267 medRxiv

Top 1%

0.5%

Show abstract

Use of fungal mycelia, which has antiviral properties, constitutes a novel strategy for addressing existing and newly emerging viral diseases. We evaluated safety and feasibility of fungal mycelia (Fomitopsis officinalis and Trametes versicolor, FoTv) for treatment of COVID-19 and assessed its antiviral effects and potential to reduce symptoms. In a randomized, double-blind, placebo-controlled, dual site (UCSD/UCLA medical centers) clinical trial we examined non-hospitalized patients who contracted mild-to-moderate COVID-19 [≤] 96 hours, and experienced symptom onset [≤] nine days, before enrollment. FoTv was safe, well-tolerated, and feasible for COVID-19 treatment. Minor differences in biochemical markers were observed between groups (26 FoTv, 24 Placebo). FoTv significantly reduced the number and severity of symptoms, particularly sore throat/cough, and in vitro SARS-CoV-2 (pseudovirus) cellular infection. In conclusion, FoTv was safe and reduced COVID-19 symptoms and cellular viral infection. Future studies should investigate therapeutic benefits of fungal mycelia for SARS-CoV-2 and other viruses. Clinicaltrials.gov registration:NCT04667247.

18

Liver biopsy confirms precise and efficient correction of SERPINA1 after in vivo Base Editing in a Patient with Alpha-1 Antitrypsin Deficiency

Krooss, S. A.; Yang, T.; Yuan, Q.; Drick, N.; Sgodda, M.; Held, J.; Behrendt, P.; Hartleben, B.; Koczulla, R.; Ma, X.; Liu, Y.; Wedemeyer, H.; Janciauskiene, S.; Di Donato, N.; Cantz, T.; Wang, E.; Wu, Y.; Hoeper, M.; Xia, Q.; Ott, M.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.01.26354551 medRxiv

Top 1%

0.5%

Show abstract

Background: Alpha-1 antitrypsin deficiency (AATD) caused by the PI*ZZ mutation (Glu342Lys) results in hepatic accumulation of misfolded AAT-Z protein and reduced circulating AAT levels, leading to progressive liver disease and emphysema. Gene correction therapy represents a potentially curative approach by directly correcting the underlying genetic defect. We report the first case of successful hepatic gene correction with early histological and functional assessment. Methods/Case presentation: We report the case of a 66-year-old male patient with PI*ZZ AATD who underwent gene correction therapy within the YOLT-202 phase I/Ia clinical trial (clinical trial.gov ID NCT07193615). Ten weeks post treatment a liver biopsy was performed to re-evaluate pre-existing F2 liver fibrosis as measured by elastography before entering the study. Serum samples allowed functional assessment of the AAT-mediated elastase inhibition. Results: Liver biopsy did not show signs of hepatic inflammation and demonstrated 54% (Sanger) and 57% (Illumina) gene correction rate of the PI*ZZ variant on the DNA level with no bystander edits or off-target effects. Following a transient elevation of transaminases during the early post-treatment period, liver enzymes normalized. Monthly serum AAT measurements demonstrated biologically active and stable therapeutic levels throughout follow-up. Conclusions: This case demonstrates efficient and precise hepatic gene correction without concerning histological alterations and with substantial improvement of functional parameters, supporting the feasibility and safety of gene editing approaches for AATD.

19

White Matter Hyperintensity Burden Modifies the Association Between Atrial Fibrillation and Cerebral Microbleeds

Ryu, W.-S.; Sunwoo, L.; Lee, M.; Kang, K.; Kim, J. G.; Lee, S. J.; Cha, J.-K.; Park, T. H.; Lee, J.-Y.; Lee, K.; Kwon, D. H.; Lee, J.; Park, H.-K.; Cho, Y.-J.; Hong, K.-S.; Lee, M.; Oh, M. S.; Yu, K.-H.; Gwak, D.-S.; Kim, D.-E.; Kim, H.; Kim, J.-T.; Kim, J.-G.; Choi, J. C.; Kim, W.-J.; Kwon, J.-H.; Yum, K. S.; Shin, D.-I.; Hong, J.-H.; Sohn, S.-I.; Lee, S.-H.; Kim, C.; Jeong, H.-B.; Park, K.-Y.; Lee, K.-J.; Kim, C. K.; Kang, J.; Kim, J. Y.; Bae, H.-J.; Kim, B. J.

2026-06-08 neurology 10.64898/2026.06.03.26354875 medRxiv

Top 1%

0.5%

Show abstract

Background: In atrial fibrillation (AF), cerebral microbleed (CMB) burden guides anticoagulation decisions, yet AF is itself inconsistently associated with CMBs, a paradox unexplained by frameworks that treat CMBs as a unitary marker of small vessel disease. We hypothesized that the white matter hyperintensity (WMH) context in which CMBs arise modifies their vascular meaning, and that this context-dependence underlies the inconsistent AF-CMB association. Methods: From a multicenter Korean stroke registry, we analyzed 5,735 first-ever ischemic stroke patients imaged at nine centers using susceptibility-weighted MRI. WMH volume and CMB count were extracted by validated deep learning pipelines. Patients were cross-classified by age-adjusted WMH residual (median split) and CMB count (2) into four groups. The AF-CMB association was estimated by multivariable logistic regression within each WMH stratum with formal interaction testing. Spatial CMB distribution was analyzed against the Automated Anatomical Labeling atlas. Results: In the full cohort (mean age 69.5 years; 57.7% male), AF was not associated with CMBs (OR 1.04; 95% CI 0.87-1.25). Stratification yielded divergent estimates: the adjusted AF OR was 1.46 (1.11-1.93; P = 0.007) in the WMH-low stratum and 0.95 (0.73-1.22; P = 0.665) in the WMH-high stratum, with significant interaction (OR 0.56; P < 0.001). The discordant phenotype (low WMH, high CMB; 8.9%) was enriched for AF (28.0%) and showed fronto-temporal cortical predominance with deep structure sparing. AF independently reduced the proportion of deep CMBs (IRR 0.80; P = 0.040). The interaction was preserved across prespecified sensitivity analyses. Conclusions: The AF-CMB association is confined to patients with low WMH burden relative to age and is accompanied by a topographically distinct CMB distribution. Clinical assessment of small vessel disease based on WMH alone may overlook a CMB phenotype linked to AF.

20

Quantifying Cancer Clinical Trial Eligibility Using Artificial Intelligence-Based Matching

Goel, K. P.; Myall, N. J.; Dickerson, J.; Caswell-Jin, J. L.; Johnson, T.; Worth, J. E.; Gensheimer, M. F.

2026-06-05 oncology 10.64898/2026.06.03.26354859 medRxiv

Top 2%

0.5%

Show abstract

PURPOSE: To develop and validate an artificial intelligence-enabled platform that converts unstructured cancer trial eligibility criteria into structured queries and quantifies trial eligibility across advanced/metastatic cancer trials. METHODS: We downloaded actively recruiting US interventional treatment trials for advanced/metastatic breast cancer, colon cancer, and non-small cell lung cancer from ClinicalTrials.gov. Medical oncologists created 24 synthetic patient vignettes. A large language model converted trial eligibility criteria into Structured Query Language (SQL) code and patient information into structured records, enabling automated matching. Cancer details and treatment history were considered, but not laboratory results or comorbidities. Validation included physician editing of generated eligibility code for 30 trials, and blinded physician eligibility assessment for five trials. We then evaluated how age, ECOG performance status, sex, and ZIP code affected the number of eligible trials. RESULTS: Of 833 candidate trials, 746 met inclusion criteria. In physician review of 30 trials, edits to generated SQL did not change any of 720 trial-patient eligibility determinations for 24 synthetic patients. In blinded validation across 120 trial-patient pairs, automated matching achieved 97% accuracy. Across synthetic patients, eligible trials ranged from 31 to 258 when there were no geographic restrictions. Eligibility decreased markedly with worse performance status and with geographic restriction (both p<0.001). Later-phase, randomized, and molecularly selective trials had fewer eligible patients. CONCLUSION: AI-based structuring of trial eligibility criteria can support accurate, scalable measurement of potential cancer trial eligibility. In this demonstration, performance status, geography, and age were major determinants of eligibility across the active metastatic trial landscape.