Back

Database

Oxford University Press (OUP)

Preprints posted in the last 7 days, ranked by how well they match Database's content profile, based on 51 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.

1
A priority index-based computational medicine framework (PimRNA) for prioritising personalised mRNA cancer vaccines

Fang, H.; Tan, T.

2026-05-29 oncology 10.64898/2026.05.26.26354114 medRxiv
Top 1.0%
0.7%
Show abstract

Background: The development of personalised mRNA cancer vaccines holds considerable promise for oncology, yet a significant translational gap persists between neoantigen identification and the selection of therapeutically impactful targets. Current approaches predominantly prioritise human leukocyte antigen (HLA) binding affinity and immunogenicity, often overlooking the systems-level biological context of the target. This can inadvertently favour immunogenic but biologically peripheral peptides that exert limited influence on tumour signalling networks, thereby constraining vaccine efficacy. Furthermore, mRNA therapeutics must satisfy additional design requirements, including favourable codon usage and favourable secondary-structure stability, which directly affect in vivo translation and half-life. A unified computational framework that integrates neoantigen discovery with network biology is therefore critically needed. Results: Here, we present PimRNA, a Priority index (Pi)-centric computational medicine framework that bridges this gap by unifying neoantigen identification, mRNA sequence optimisation, and gene interaction network analysis. First, high-confidence tumour-specific HLA class I and II neoantigenic peptides are identified from paired tumour-normal genomic and tumour transcriptomic data using NeoDisc. Second, the coding sequences of these peptides are optimised for stability and translational efficiency with LinearDesign, yielding a core set of neoantigen-encoding mRNAs. Third, a random walk with restart algorithm is applied to a knowledgebase of gene interactions to identify peripheral genes exhibiting significant network connectivity to core genes, generating a gene-predictor matrix in which each gene is assigned an affinity score reflecting its network proximity to immunogenic neoantigens. These scores are consolidated into a single, unified priority rating (0-5) for each gene, followed by subnetwork analysis that reveals therapeutically relevant gene modules. Application of PimRNA to breast cancer and melanoma datasets demonstrates that it successfully selects high-confidence immunogenic neoantigen candidates embedded within biologically meaningful tumour-specific networks. Conclusion: PimRNA provides a systems biology foundation for mRNA vaccine design, moving beyond isolated immunogenicity to prioritise targets that are both highly presented and central to tumour-relevant biological networks. This framework offers a generalisable strategy for the rational discovery and prioritisation of mRNA therapeutics, significantly advancing the field of computational medicine towards personalised cancer vaccines.

2
Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

Hu, S.; Cheng, H.; Gillenwater, L.; Manpearl, K.; Mandava, A.; Wang, Y.; Pividori, M.; Stranger, B.; Krishnan, A.; Greene, C.; Gao, Y.

2026-05-28 health informatics 10.64898/2026.05.26.26354182 medRxiv
Top 1%
0.7%
Show abstract

Objective. Biomedical knowledge graphs (KGs) such as PrimeKG, Hetionet, UMLS, and PharmGKB are increasingly used as the substrate for downstream machine-learning, retrieval-augmented generation, drug-repurposing, and electronic health record (EHR) augmentation pipelines. The dominant assumption in published work is that integrating two or more such KGs is a tractable engineering step solved by identifier (ID) matching. This paper interrogates that assumption empirically. We quantify how much concept overlap survives realistic alignment, and we characterize the new failure modes introduced by the methods that practitioners reach for when ID matching is insufficient. Materials and Methods. We compared four widely used biomedical KGs (PrimeKG, Hetionet v1.0, the full UMLS Metathesaurus, and PharmGKB) across eleven node types using a tiered alignment pipeline: (1) direct ID matching for nodes sharing a primary vocabulary; (2) cross-ontology bridging using standard mappings (e.g., MONDO-DOID, HPO-UMLS, HPO-UMLS-MeSH for side effects, NCBI Gene-HGNC-UMLS, UBERON-FMA/SNOMEDCT_US/NCI/MeSH for anatomy); (3) ClinicalBERT cosine-similarity grouping at threshold >= 0.98 for over-segmented disease nodes, with a deterministic suffix-stripping canonicalizer; (4) exact name matching for ontology-poor types (anatomy, REACTOME pathways); and (5) embedding-based fuzzy matching with UMLS lookup (SapBERT and ClinicalBERT) for free-text microbiome concepts. We applied the pipeline to a 698-concept gut-microbiome benchmark spanning taxa, pathways, and disease labels, validated grouping decisions against the curated SSSOM mappings released by the MONDO project, and audited the ClinicalBERT consolidation against five clinical-genetics case studies drawn from the literature. Results. Per-type pairwise coverage was strikingly asymmetric. Genes/proteins and the three Gene Ontology categories aligned cleanly across PrimeKG and Hetionet (mutual coverage 94-99%), but disease overlap was sparse: only 0.7% of PrimeKG individual disease nodes mapped to Hetionet, rising to 2.0% after MONDO grouping (versus 78.7% and 18.4% from the Hetionet side). PrimeKG-to-UMLS coverage spanned 100% (effect/phenotype via HPO) down to 20.8% (REACTOME pathways), with drugs at 73.7% and anatomy at 58.8%. PrimeKG-to-PharmGKB drug coverage required up to two bridging hops (DrugBank -> UMLS -> RxNorm/ATC/MeSH). Bigger was not uniformly more complete: on a 698-concept microbiome drug benchmark, Hetionet missed 0 concepts while PrimeKG missed 16. ClinicalBERT-based grouping consolidated 22,205 raw MONDO disease nodes into 17,080 groups but introduced three reproducible failure modes documented in case studies: (i) peer over-merging: for example, all 22 osteogenesis imperfecta subtypes collapsed into a single node despite distinct severity classes; (ii) parent-child collapse: e.g. acute myeloid leukemia merged with myeloid leukemia, erasing the acute/chronic distinction that drives clinical management; and (iii) lexical false positives: neurofibromatosis and schwannomatosis grouped together despite cellular-pathology differences. Discussion. Identifier matching alone is a weak baseline for biomedical KG integration. Cross-ontology bridges and embedding-based consolidation expand coverage but do so at the cost of clinically meaningful resolution, and the resulting failures are systematic rather than random. Reporting only aggregate coverage statistics obscures these losses, which propagate silently into downstream tasks. Conclusion. We provide reusable per-type coverage tables, a taxonomy of three integration failure modes, and concrete recommendations for downstream studies that depend on a unified biomedical KG. We argue that future KG integration work should report per-type coverage and per-cluster confidence rather than aggregate match rates.

3
Compatibility of National Food Composition Databases with USDA FoodData Central: A Seven-Country LLM-Based Analysis

Nakagawa, S.; Yamamoto, A.

2026-06-01 nutrition 10.64898/2026.05.23.26353942 medRxiv
Top 2%
0.4%
Show abstract

To evaluate the international interoperability of food composition databases, we assessed the compatibility of seven national food composition tables with USDA FoodData Central (FDC) using the LLM-based matching method reported previously (Nakagawa and Yamamoto, 2026). Databases from four English-speaking countries (Canada, United Kingdom, Australia, and New Zealand), South Korea, and Japan were compared with 8,158 USDA FDC entries (SR Legacy and Foundation Foods, excluding Survey/FNDDS). Match rates varied by country (62.0-89.7%) and food category. After excluding six USDA categories unsuitable for cross-national comparison, 45.2% of the remaining 6,290 entries were not matched by any country. Canada showed the highest concordance, reflecting shared North American food supply. Japan and South Korea showed similar low coverage for vegetables and spices. These findings suggest that while USDA FDC represents a practical foundation for a globally comprehensive food composition database given its breadth, systematic incorporation of country-specific foods and classification schemes will be necessary to achieve true international interoperability.

4
Impact of AI-Assisted Mammography Reading on Quality Indicators in the Czech Breast Cancer Screening Programme: A Retrospective Study

Veverkova, L.; Dolezalova, Z.; Marackova, V.; Mathew, E.; Urbankova, M.; Ambrozova, M.; Piskovsky, T.; Ngo, O.; Majek, O.

2026-05-26 oncology 10.64898/2026.05.25.26353869 medRxiv
Top 2%
0.4%
Show abstract

Objectives: The aim of mammographic screening is the early detection of invasive cancers. In the era of artificial intelligence (AI), this tool may improve diagnosis of earlier stages. The purpose of this study was to assess the impact on selected quality indicators retrospectively. Method: The data source was the Breast Cancer Screening Registry using data from one Screening Unit that currently uses AI routinely. The indicators of the cancer detection rate (CDR), further assessment rate (FAR), and recall rate (RR) in the year 2023, when AI was used, and the year 2022, without AI, in women aged 45-69 were compared. The statistical evaluation used the chi-square test and logistic regression adjusting for the effects of age, a woman's risk level, and the screening round at a 5% significance level. Results: In 2022, without AI, 4,034 women aged 45-69 were included, compared with 4,049 women in 2023 when AI was used. This study showed a non-significant increase in CDR from 5.0 breast cancers detected per 1,000 women (non-AI assessment) to 5.2 (AI-assisted assessment), p = 0.919; OR (95% CI): 1.034 (0.542-1.974), a significant decrease in the FAR from 5.2% to 3.9%, p < 0.001; OR (95% CI): 0.665 (0.529-0.836), and a decrease in RR from 2.4% to 1.9%, p = 0.083; OR (95% CI): 0.754 (0.548-1.037). Conclusion: AI has the potential to be a useful tool in the early detection of breast cancer by improving quality through a decrease in FAR and RR, while probably maintaining CDR.

5
Ranked (In)direct Citation Searching in Systematic Reviews: A methodological case study

Woelfle, T.; Fucile, G.; Hirt, J.; Pena, R. C. G.; Vogt, M.; Nordhausen, T.; Ewald, H.; Appenzeller-Herzog, C.

2026-05-27 medical education 10.64898/2026.05.26.26354093 medRxiv
Top 3%
0.2%
Show abstract

Systematic Review (SR) is a prosperous study type in modern medicine and beyond. Many SR authors complement their primary database searches by supplementary techniques. Among these, citation-based techniques known as citation searching (CS) are widespread. Unranked Direct CS (UDCS) to identify directly cited and citing literature of seed references is currently most prevalent. Ranked (In)direct CS (RICS) additionally collects co-cited and co-citing literature combined with a ranking and cut-off procedure. However, RICS workflows remain non-standardized and tedious, and associated benefits unclear. This work aims to create a framework for the prospective international comparison of supplementary UDCS and RICS. To prime RICS research, we developed the open-source Co*Citation Network application and assessed parallel supplementary UDCS and RICS retrospectively in three completed SRs and prospectively in one case study. Automated RICS collected and ranked cited, citing, co-cited, and co-citing literature of seed references from OpenAlex database and applied an empirical rank cut-off to approximate the volume of UDCS results. In RICS compared to UDCS, we consistently noted higher overlap with primary database search results. Title/abstract screening in the case study showed a precision (number needed to read) of 1.8% (57) for UDCS and 2.1% (48) for RICS results. After full text screening, two additional articles were included for review, one of which was identified by UDCS and RICS, and one exclusively by UDCS. The present study indicates potential benefits of RICS for SR authors and will enable the formation of a research consortium to compare supplementary UDCS and RICS on larger scale.

6
Generation and Evaluation of Realistic Synthetic Clinical Progress Notes for Prostate Cancer using Large Language Models.

Rey-Blanes, A.; Veredas-Morente, J.; Vivas-Vargas, E.; Gil-Garcia, F.; Moreno-Barea, F. J.; Veredas, F. J.

2026-05-28 health informatics 10.64898/2026.05.25.26354027 medRxiv
Top 4%
0.2%
Show abstract

Background and Objective: Access to real-world electronic health records (EHRs) remains limited by privacy, governance and annotation constraints, hindering the development of clinical natural language processing models. Realistic synthetic progress notes may provide EHR-like corpora that preserve clinically rigorous information on diagnoses, treatments, symptoms, imaging, laboratory findings and therapeutic trajectories without relying directly on sensitive patient records. This study evaluates whether large language models (LLMs) can generate realistic Spanish prostate cancer progress notes from published case reports, preserving clinical content, temporality and hospital-style conventions.

7
Nationwide Trends and Outcomes in Major Gastrointestinal Cancer Surgery

espinoza, r. e. d. a.; Bastos, L. S. L.; Hamacher, S.; Salluh, J. I. F.; Bozza, F. A.

2026-05-27 oncology 10.64898/2026.05.26.26354087 medRxiv
Top 4%
0.2%
Show abstract

Background Complex gastrointestinal (GI) oncologic surgeries carry substantial perioperative risk, and nationwide outcomes in low- and middle-income countries (LMICs) are underreported. This study aimed to evaluate national trends in surgical volume, in-hospital mortality, and intensive care unit (ICU) utilization for major GI cancer surgery in Brazils Unified Health System (SUS) over a 14-year period. Methods A population-based analysis was performed using national administrative databases to identify all adult patients undergoing colectomy, gastrectomy, pancreatic resection or esophagectomy for cancer in the SUS from 2010-2023. Annual rates were age-standardized according to the WHO standard population. Temporal trends were assessed using Poisson regression to estimate average annual percent change (AAPC) with 95% confidence intervals (CIs). Results A total of 179,337 hospital admissions were analyzed (median age 63 years; 48% female). Colectomies accounted for 72% of cases, followed by gastrectomies (19%), pancreatic resections (5%), and esophagectomies (3%). Although crude surgical volume increased, population-adjusted rates declined overall (AAPC -2.09%; 95% CI -2.58 to -1.59), mainly due to reductions in gastrectomies and esophagectomies. Median hospital stay decreased from 9 to 7 days (AAPC -1.93%; 95% CI -2.79 to -1.06). Overall in-hospital mortality declined from 8.1% to 5.7% (AAPC -2.88%; 95% CI -4.15 to -1.59). ICU utilization rose from 37% to 43% of admissions (AAPC +1.31%; 95% CI 0.91 to 1.71). Conclusion Over 14 years, in-hospital mortality and length of stay for major gastrointestinal cancer surgery declined within Brazils universal public health system. These temporal trends occurred alongside expansion of accredited oncology services and increased ICU utilization, although causal relationships cannot be established from administrative data. These findings should be interpreted as hypothesis-generating and highlight the need for more granular hospital-level data in LMIC settings.

8
Dynamic Topic Alignment and Sentiment between Official Health Communication and General Public Discourse during COVID-19: A Comprehensive Infoveillance Framework

Yin, S.; Xin, W.; Chen, S.; Ge, Y.

2026-05-27 public and global health 10.64898/2026.05.23.26353966 medRxiv
Top 4%
0.1%
Show abstract

Social media has become a critical channel for public health communication during the COVID-19 pandemic, yet how official health messaging aligns with broader public discourse remains insufficiently understood. This study develops an end-to-end info-veillance framework to examine the dynamic relationship between Centers for Disease Control and Prevention (CDC) communications and general public discourse on social media. We analyzed 17,524 CDC tweets and 67,895 public discourse tweets. Biterm Topic Model (BTM) was used to extract topics from each corpus, and a novel topic consistency scoring system integrating cosine similarity with daily public topic prominence was developed to quantify temporal alignment between official health communication and public discourse. Two complementary sentiment measures were incorporated: expected sentiment (average emotional tone) and net sentiment (overall emotional intensity). Temporal relationships were examined using autoregressive integrated moving average with exogenous variables (ARIMAX) models. Results show that topic alignment increased over time across CDC topics, while expected sentiment remained consistently negative. Higher alignment was associated with immediate and delayed changes in expected sentiment and stronger emotional intensity in net sentiment based on ARIMAX results. These findings suggest that topic alignment reflects public attention rather than agreement with official communications, and is associated with more negative emotional responses. This framework provides a scalable, generalizable approach to investigate and evaluate public engagement with official health communication.

9
Distinguishing Age-specific Patterns in Comorbidities of Obstructive Sleep Apnea Using Real-World Data

Goodman, M. O.; Alex, R. M.; Sands, S. A.; Azarbarzin, A.; Batool-anwar, S.; Pavlova, M. K.; Epstein, L. J.; Redline, S.; Cade, B. E.

2026-05-28 epidemiology 10.64898/2026.05.20.26352336 medRxiv
Top 5%
0.1%
Show abstract

Obstructive sleep apnea (OSA) is associated with a wide range of comorbidities, but the extent to which these follow predictable, age-dependent patterns is not well understood. Identifying such patterns could provide insight into OSA heterogeneity and its links to physiological measures of OSA. We trained age-dependent topic models (ATM) on longitudinal electronic health records from 36,426 patients with OSA in the Mass General Brigham Biobank. ATM organizes incident diagnoses into distinct comorbidity "topics," whose age-specific disease loadings represent predictive patterns linking related diagnoses across the life course. We applied the trained model to compute individual-level topic scores in independent data: a cohort of 11,689 OSA cases and 22,695 matched controls, and a cohort of 6,220 patients with polysomnography (PSG)-derived physiological measures. We identified 19 distinct age-dependent comorbidity profiles, all significantly associated with OSA case status (FDR-adjusted p<0.05). Topics reflected recognizable clusters including metabolic, neuropsychiatric, and immune-mediated conditions, and several were distinguished by age-of-onset of key comorbidities, such as early- vs late-onset asthma. Seventeen of the 19 topics were significantly associated with at least one of 13 PSG-derived physiological measures, including associations between cardiometabolic topics and the apnea-hypopnea index, sleep apnea specific hypoxic burden, and respiratory event-specific heart rate burden. These findings indicate that age-dependent comorbidity patterns distinguish meaningful OSA subtypes with differing prognoses and endophenotype associations. ATM offers insight into complex OSA comorbidity and suggests that age-informed, topic-based stratification may improve individualized risk assessment, interpretation of PSG findings, and targeting of clinical interventions.

10
Influenza vaccine effectiveness against pneumonia and COPD exacerbations among patients with chronic obstructive pulmonary disease in Thailand: A national test-negative design study, 2013-2024

Chawalchitiporn, S.; Tantiyavarong, P.; Kittiwatanachod, J.; Naosri, S.; Prasert, K.; Praphasiri, P.

2026-05-27 epidemiology 10.64898/2026.05.26.26354178 medRxiv
Top 5%
0.1%
Show abstract

Background/Objectives: Influenza infection is a major trigger of pneumonia and acute exacerbations among patients with chronic obstructive pulmonary disease (COPD). However, national laboratory-confirmed evidence on influenza vaccine effectiveness (VE) in this high-risk population remains limited. This study aimed to estimate the effectiveness of seasonal influenza vaccination against influenza-associated pneumonia and COPD exacerbations among patients with COPD in Thailand.Methods: We conducted a nationwide retrospective test-negative design study using administrative healthcare data from the National Health Security Office linked with laboratory-confirmed influenza surveillance data between June 1, 2013, and May 31, 2025, covering twelve influenza seasons (2013-2024). COPD-related clinical episodes among patients aged [&ge;]40 years who presented with pneumonia or acute exacerbation of COPD and underwent RT-PCR testing for influenza were included. Multilevel Poisson regression models were used to estimate adjusted risk ratios (RRs), and VE was calculated as (1 - adjusted RR) x 100.Results: A total of 606,072 COPD-related clinical episodes were included, of which 192,224 (31.7%) were influenza-positive. The overall adjusted VE against influenza-associated pneumonia was 63.2% (95% CI: 62.5-64.0), while VE against influenza-associated COPD exacerbations was 67.0% (95% CI: 48.8-78.8). VE estimates were broadly similar across age groups and remained substantial across COPD severity strata. Although point estimates were numerically higher in severe and very severe COPD, subgroup differences should be interpreted cautiously.Conclusions: Seasonal influenza vaccination was associated with substantial protection against influenza-associated pneumonia and COPD exacerbations among patients with COPD in Thailand.

11
Biological Age and Complication Prediction in Hypertension: A 13-Year Cohort

Kim, B.-s.; Bae, C.-y.; Kim, I.-h.; Choi, Y.-j.; Jeon, M.-h.

2026-05-29 epidemiology 10.64898/2026.05.27.26354288 medRxiv
Top 6%
0.1%
Show abstract

1. Background: With the rising prevalence of hypertension, especially among younger populations, there is a critical need to better assess health status and predict associated complications. This study developed a biological age model ("hypertension age") for hypertensive patients to predict the risk and timing of major complications. 2. Methods: Using South Korea's NHIS-NHID data, researchers analyzed 4,535,041 hypertensive patients who underwent health examinations between 2009 and 2010. Patients were followed for an average of 12.40 years (until 2022). Principal Component Analysis (PCA) was used to develop the biological age (cBA) model. The risk and onset timing of complications were analyzed using Cox proportional hazards and multiple regression models, adjusting for variables like medication use and baseline diseases. 3. Results: A 1-standard deviation (SD) increase in the age gap?where biological age exceeds chronological age (cBA - CA)?was significantly associated with an elevated risk for all major complications in both sexes (p < 0.001). Furthermore, a 1-SD increase in this gap significantly accelerated the time to complication onset for nearly all conditions (p < 0.001), with the exception of dementia in women. The impacts of medication use, hypertension duration, and baseline comorbidities varied by specific complication. 4. Conclusions: Lowering "hypertension age" relative to chronological age can significantly reduce the risk and delay the onset of major cardiovascular and related complications. Quantifying this biological age gap serves as a powerful motivational tool for personalized health management and complication prevention in hypertensive patients.

12
Segmental Lung Sound Analysis in Obstructive Lung Diseases Using Electronic Stethoscope; a protocol to establish an acoustic repository

Anuradha, H.; Yasaratne, D.; GMRI, G.; Parakrama, E.; Severin, R.

2026-05-28 respiratory medicine 10.64898/2026.05.27.26354263 medRxiv
Top 6%
0.1%
Show abstract

Introduction Obstructive lung diseases (OLDs) are responsible for high rates of illness and death worldwide. Inflammation, chronic airflow limitation, and bronchial remodeling occur in OLD and eventually result in the unique respiratory sounds. Despite its subjective and having low reproducibility, still traditional auscultation using a manual stethoscope is the main method used to identify the lung sounds. Nevertheless, the combination of recent advancements in digital stethoscopes and AI (Artificial Intelligence) has permitted the objective measurement of lung sounds. Nevertheless, there is a lack of standardized, region-specific databases for AI training and validation. Even though lung sound classification is an emerging aspect in research and telerehabilitation the lobar wise acoustic pattern is still novel due to lack of prevailing database to train AI models. Identifying this gap this study aims to develop an acoustic repository and analyze the data using segmental lung sounds from patients with OLDs and healthy controls through an electronic stethoscope. Methods and analysis This is a cross sectional observational study involving 120 participants (60 OLD patients and 60 healthy controls). Lobar wise acoustic signals will be captured using an electronic stethoscope in healthy and diseases population. The data will be analyzed using Audacity software for annotations and then it will be used for feature extraction and statistical analysis. The acoustic features extracted through Audacity, will include frequency, intensity, pitch, and root mean square (RMS) energy. Repeated measures ANOVA will be applied to compare mean sound intensities across lung segments while Pearson correlation will be used to assess associations with body composition parameters. The data will then be standardized for AI-based diagnostic applications. Ethics and dissemination The study is being reviewed from the Ethics Review Committee, Faculty of Medicine, University of Peradeniya (2025/EC/87) will be sought. Informed consent will be obtained in writing. The dissemination of results will take place through peer-reviewed publications and the creation of a public database containing lung sounds from the region.

13
Immune Checkpoint Response Profiles and Resistance Mechanisms in NSCLC Revealed by Circulating Extracellular Vesicle Proteomics

Taylor, C.; Davey, M.; Allain, E. P.; Cheema, A. S.; Crapoulet, N.; Finn, N.; Abd, M.; Ouellette, R.

2026-05-26 oncology 10.64898/2026.05.25.26354042 medRxiv
Top 6%
0.1%
Show abstract

Background: Immune-oncology has revolutionized cancer treatment, but some patients fail to benefit due to primary resistance and tumour-immune evasion. Extracellular vesicles (EVs) are secreted by both tumour and immune cells and mediate communication between cancer cells and the immune system. Our study used proteomic profiling of circulating EVs collected from NSCLC patients treated with immune checkpoint inhibitors (ICI) to identify predictive biomarkers of response as well as immune evasion mechanisms related to treatment resistance. Methods: EVs were isolated from plasma collected prior to ICI treatment using peptide-affinity purification and high-throughput proteomics was performed using Proximal Extension Assay. Differentially expressed EV proteins between durable (DR) and non-durable responders (NDR) were identified and evaluated using Cox proportional hazards regression, survival analysis, sex-stratified analysis, as well as pathway and network analysis. Results: Proteomics analysis identified 116 differentially expressed EV proteins between DR and NDR. NDR was characterized by enrichment of inflammatory, angiogenic, and immune-suppressive EV proteins, such as IL1RL1, TFRC, IL6ST, galectins, TNF superfamily death receptors, chemokines, and PCSK9. Pathway analysis revealed enrichment of angiogenesis, chemotaxis, ECM remodeling, and neutrophil degranulation associated with poor progression-free survival (PFS). In contrast, DR to ICI treatment was associated with EV proteins related to T- and B-cell activation and adaptive immunity. Sex-related differences in abundance and association with PFS was observed for certain EV proteins, including IL1RL1 and TFRC. A six protein EV model (IL1RL1, TFRC, ERI1, CCN5, IGFBPL1, and TNFRSF13C) demonstrated good prognostic performance for identifying NDR (AUC = 0.907) and stratified patients into three discrete risk groups. Conclusions: High-plex EV proteomics revealed biologically coherent tumour-immune signaling programs that are associated with ICI treatment resistance. Profiling circulating EVs may improve our understanding of EV-mediated immune evasion mechanisms and identify protein signatures that reflect the tumour immune microenvironment and predict response to immune checkpoint blockade.

14
Decomposing growth in a national HL7 CDA clinical document repository

Talvik, H.-A.; Laur, S.; Vilo, J.; Reisberg, S.

2026-05-26 health informatics 10.64898/2026.05.24.26353991 medRxiv
Top 7%
0.1%
Show abstract

Longitudinal evaluations of national electronic health record repositories often track document counts alone, obscuring changes in content size, structure and standards implementation. We decomposed growth in the Estonian Health Information System across document counts, per-document size, section-level structure and version uptake in a 10% random population sample of 4.97 million HL7 Clinical Document Architecture Release 2 documents from 147,819 patients, spanning 2012--2019 and four prespecified document types. Growth patterns differed by document type. Inpatient summaries increased 48.5% in total content volume despite a 2.4% decline in document counts. Section presence and within-section content were highly skewed; 44.6% of 892 data locations carried one fixed value. Code-system diversity increased from 45 to 79, and version uptake took years: inpatient summaries reached 80% organisational uptake after a median 44 months (95% CI 11--78). This decomposition can guide extraction pipelines, secondary use and standards governance in CDA- and FHIR-based repositories.

15
The Global Pediatric Diarrhea Surveillance network: Rationale and methods

Soeters, H. M.; Antoni, S.; Iyer, S. S.; Weldegebriel, G.; Biey, J.; Mwenda, J. M.; Rey-Benito, G.; Ortiz, C.; Pastore, R.; Videbaek, D.; Singh, S.; Njambe, E.; Sangal, L.; Dhongde, D.; Grabovac, V.; Logronio, J.; Fahmy, K.; Ghoniem, A.; Armah, G.; Dennis, F. E.; Seheri, M. L.; Magagula, N.; Rakau-Nondela, K.; Fumian, T. M.; Maciel, I. T. A.; Samoilovich, E.; Semeiko, G.; Varghese, T.; Thomas, S.; Bines, J.; Li, D.; Kabir, F.; Liu, J.; Houpt, E. R.; Gautam, R.; Mirza, S. A.; Vinje, J.; Mulders, M. N.; Tate, J. E.; Parashar, U. D.; Platts-Mills, J. A.; Global Pediatric Diarrhea Surveillance net

2026-05-27 public and global health 10.64898/2026.05.21.26352576 medRxiv
Top 7%
0.1%
Show abstract

Background Diarrhea remains a leading cause of child morbidity and mortality worldwide. Improved and ongoing estimates of the etiologies of severe diarrhea, particularly in low- and middle-income countries (LMICs), are crucial to inform the use of current vaccines and other interventions and to help prioritize the development of new vaccines. Producing rigorous longitudinal data on the global burden and etiology of pediatric diarrhea requires a geographically broad surveillance network with standardized epidemiologic, laboratory, and analytic protocols. Methods We describe the rationale and methods of the Global Pediatric Diarrhea Surveillance (GPDS) network, a World Health Organization (WHO)-coordinated public health surveillance network investigating the etiology of hospitalized diarrhea among children aged <5 years in LMICs. The GPDS network enrolls children hospitalized with diarrhea at 38 sentinel surveillance sites in 31 LMICs across all 6 WHO Regions. Randomly selected stool specimens were tested by TaqMan Array Card quantitative polymerase chain reaction for 16 enteric pathogens previously associated with pediatric diarrhea. GPDS produces estimates of pathogen-specific attributable fractions and incidence of diarrheal hospitalizations at the global, regional, and country levels. Conclusions As a WHO-coordinated global surveillance network, GPDS evaluates pathogens associated with hospitalized pediatric diarrhea. The network monitors the changing burden of pathogens over time, monitors circulating strains, and generates data to inform decision-making around public health interventions. GPDS also improves global, regional, and country diarrheal disease burden estimates, informs new enteric vaccine development, and potentially provides a platform for future enteric vaccine evaluation.

16
Cancer Prevalence and Patterns in Kilifi County: A 10-year Retrospective Descriptive Study

Masha, M.; Mbugua, R. W.; Abdullahi, M.; Sheikh, N. A.; Omar, A.; Abdihamid, O.

2026-06-01 oncology 10.64898/2026.05.20.26353643 medRxiv
Top 8%
0.0%
Show abstract

Abstract Background Cancer is an increasing public health challenge in Kenya, particularly in rural and underserved regions where surveillance systems and diagnostic capacity remain limited. Kilifi County, located along the Kenyan coast, lacks a population-based cancer registry, and data on the local cancer burden is not available. This study aimed to characterize the demographic distribution of patients, cancer burden in the county, and management of cancer cases diagnosed at Kilifi County Referral Hospital (KCRH) over ten years. Methods This retrospective study analyzed the patterns of cancer in Kilifi County using patient records from KCRH during the study period (January 1, 2014, to January 1, 2024). Results A total of 101 patients with cancer were identified, 58% female, with a mean age of 54 years. Most patients were from Kilifi North (47%), with a high proportion reporting no formal occupation (41%) or farming (26%). Esophageal and cervical cancers were the most common (18% each), followed by breast and prostate cancers (5% each), with other malignancies occurring infrequently. Histopathology was the primary diagnostic modality (88%). Staging data were incomplete in 70% of cases; among documented cases, the majority presented with advanced disease (21% stage IV). Due to limited local treatment capacity, approximately half of the patients were referred to tertiary centers for chemotherapy, radiotherapy, or surgery. At data cut-off, 43% had died, 25% were on treatment, and 29% were lost to follow-up, with only 2% completing treatment or under follow-up. Conclusions This study demonstrates a substantial cancer burden in Kilifi County and highlights critical gaps in diagnostic capacity, staging, and continuity of care. Strengthening cancer surveillance systems, expanding diagnostic and treatment infrastructure, and establishing a population-based cancer registry are essential to improving cancer outcomes and advancing equitable care in rural Kenya

17
Prevalence of nutritional, behavioral and anthropometric cancer-related risk factors among adults in Nouakchott, Mauritania: a cross-sectional study

Tolba, N.; Najdi, A.; El Hfid, M.; Hmeied Maham, M.; Brahim, S. M.; Tolba, A.; Sellal, N.

2026-05-26 epidemiology 10.64898/2026.05.23.26353924 medRxiv
Top 8%
0.0%
Show abstract

Background Cancer is a growing public health challenge in low- and middle-income countries, where urbanization, nutritional transition and lifestyle changes contribute to modifiable risk factors. In Mauritania, population-based data on cancer-related nutritional, behavioral and anthropometric risk factors remain limited. Objective To describe the frequency of the main nutritional, behavioral and anthropometric cancer-related risk factors among adults living in the three wilayas of Nouakchott. Methods A cross-sectional study was conducted among 1,000 adults aged 18 years and older in Nouakchott. Data were collected using a standardized questionnaire covering sociodemographic characteristics, dietary habits, physical activity and selected health behaviors. Anthropometric measurements were performed to assess body mass index and abdominal adiposity. Abdominal obesity was defined using sex-specific waist circumference cut-off points recommended by the World Health Organization: [&ge;] 88 cm in women and [&ge;] 102 cm in men. Results were presented as frequencies and proportions, with comparisons by sex, age group and wilaya of residence. Results Women represented 52.0% of participants, and 53.5% were aged 18-34 years. Excess body weight was frequent, with 38.6% overweight and 28.0% obese. Abdominal adiposity was also common, with 58.0% having increased or substantially increased waist circumference and 48.3% having an elevated waist-to-hip ratio. Physical inactivity was reported by 64.7% of participants, and 15.7% were current smokers. Dietary exposures included high red meat consumption in 66.8%, daily refined cereal intake in 67.5%, daily sugar-sweetened beverage consumption in 14.9%, and limited daily fresh fruit consumption in 13.8%. Significant differences were observed by sex for anthropometric indicators, by age for selected dietary habits, and by wilaya for physical activity, smoking and selected dietary behaviors. Conclusion This study shows a high frequency of modifiable cancer-related risk factors among adults in Nouakchott, particularly excess body weight, abdominal adiposity, physical inactivity and unfavorable dietary habits. These findings support the need to strengthen primary prevention strategies targeting nutrition, physical activity and tobacco control in Mauritania.

18
Keeping human in the loop: A three-phase generative AI workflow for research integrity in data-intensive science.A methodological case study using elite Ethiopian distance-running data

Galko, P.; Yisamaw, A.; Haugen, T.; Seiler, S.

2026-05-29 sports medicine 10.64898/2026.05.29.26354013 medRxiv
Top 8%
0.0%
Show abstract

Background: Generative AI tools can support data-intensive research by writing code, drafting prose, searching analytical possibilities, and stress-testing claims. They can also produce false citations, drift between statistical specifications, and lose continuity across long investigations. This paper describes a practical workflow for using AI systems in empirical research while keeping discovery, verification, and accountability inspectable. Methods: We developed and applied a three-phase human-AI workflow to a case study of 14 elite Ethiopian distance runners. The dataset contained 22,605 GPS-segments collected across 97 consecutive days in late 2025, supplemented by venue and athlete metadata collected in the field. Phase 1 used an autonomous data-exploration tool to pre-filter the hypothesis space across five seeded research questions. Phase 2 used an AI system under direct human guidance to construct candidate findings into numerical claims, verification scripts, and draft text. Phase 3 used an independent AI system in an adversarial role to stress-test methods, statistics, prose, figures, and citations. The workflow was informed by Pearl's distinction between association, intervention, and counterfactual reasoning, with human judgement retained for research direction, interpretation, and final claims. Results: The workflow produced three empirical analyses and a documented correction process. The analyses estimated an altitude-to-sea-level pace correction of +0.10 min/km per 1,000 m at matched heart rate, showed why pooled altitude-surface regression was not identifiable within this venue system, documented method-dependence in heart-rate-based intensity classification, characterised within-venue route variation as a 64/36 path-fixed-to-trail-variable split with the Sululta label resolving into two functionally distinct sub-venues, and reframed the cohort's training through a 3x3x3 prescription lattice grounded in Ethiopian coaching practice. The adversarial phase identified several hallucinated citations, a terminology error between HC1 and cluster-robust standard errors, and several inconsistencies between prose, figures, and computed results. Verification scripts re-derived nearly all numerical claims from the cleaned lap-level data. Conclusions: The case study shows how researchers can organise AI-assisted empirical work so that candidate discovery, claim construction, independent stress-testing, and final accountability remain separated. The workflow did not remove the need for domain expertise or human judgement. Its value was in making the route from candidate finding to manuscript claim explicit, reproducible, and open to challenge. Trial registration: Not applicable.

19
Tricuspid Valve Remodeling in a New Grading Scheme for Functional Tricuspid Regurgitation: A Three-Dimensional Echocardiography Study

Xie, M.; Zhou, Y.; Li, H.; Xie, Y.; Yan, X.

2026-05-29 radiology and imaging 10.64898/2026.05.27.26354283 medRxiv
Top 8%
0.0%
Show abstract

Background: The specific 3D morphological substrates distinguishing the newly defined massive and torrential functional tricuspid regurgitation (FTR) phenotypes from standard severe disease remain under-characterized. Objectives: This study investigates the 3D geometric changes of the tricuspid valve (TV) apparatus across the spectrum of FTR, specifically focusing on the structural definition of massive and torrential grades. Methods: Three-dimensional (3D) transesophageal echocardiography (TEE) was performed in 322 patients with FTR secondary to left-sided heart disease. Patients were stratified into mild-moderate (n=166), severe (n=82), and massive-torrential (n=74) groups. TV geometry, including annular dimensions, leaflet tethering, and subvalvular apparatus, was quantified using 3D modeling software. Results: Patients with massive-torrential TR were characterized by advanced age, female predominance, and atrial fibrillation (75%). 3D analysis demonstrated that massive-torrential TR represents a distinct phenotype defined by extreme annular circularization (ellipticity index 1.0) and planar flattening (P < 0.001). Furthermore, these patients exhibited a critical leaflet-annulus uncoupling, where compensatory leaflet growth (relative length < 80%) failed to match the massive annular dilation. Consequently, the regurgitant orifice in massive-torrential grades appeared highly complex, frequently manifesting as multiple irregular orifices. Conclusions: Massive and torrential FTR are characterized by a unique geometric profile involving extreme annular circularization, severe leaflet tethering, and leaflet-annulus uncoupling. These morphological insights suggest that conventional repair strategies may be insufficient for these advanced phenotypes, highlighting the necessity for pre-procedural 3D TEE to guide device selection.

20
Nutritional status, clinical burden, and healthcare utilization among pediatric outpatients with congenital heart disease: A retrospective cross-sectional study from Indonesia

Amelia, P.; Sahertian, L. C. D.; Adriansyah, R.; Kannady, J.

2026-05-26 cardiovascular medicine 10.64898/2026.05.23.26353925 medRxiv
Top 8%
0.0%
Show abstract

Congenital heart disease contributes substantially to chronic morbidity, growth impairment, and repeated healthcare utilization among children. Evidence regarding nutritional burden and outpatient healthcare patterns among pediatric patients with congenital heart disease in Indonesia remains limited. This study aimed to evaluate clinical characteristics, nutritional status, healthcare utilization, and factors associated with malnutrition among pediatric outpatients with congenital heart disease at Adam Malik General Hospital, Indonesia. A retrospective observational study was conducted using medical records of pediatric outpatients treated between January and December 2024. Demographic characteristics, cardiac diagnoses, nutritional status, complications, and outpatient visit history were analyzed. Logistic regression analysis was performed to identify factors associated with malnutrition. A total of 606 pediatric outpatients were included. Non cyanotic congenital heart disease predominated the cohort, with ventricular septal defect representing the most common diagnosis followed by patent ductus arteriosus and atrial septal defect. Nearly half of all patients demonstrated underweight or severe underweight nutritional status, while pulmonary hypertension emerged as the most frequent complication. Younger pediatric age groups and higher cumulative clinical burden independently increased the odds of malnutrition. Children with congenital heart disease at this tertiary referral center carried a substantial nutritional and clinical burden. Early nutritional surveillance and integrated long term outpatient management may improve growth outcomes and reduce chronic disease burden in resource limited settings.