Back

Database

Oxford University Press (OUP)

Preprints posted in the last 7 days, ranked by how well they match Database's content profile, based on 51 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.

1
Patient-Centred Communication in Lung Cancer Screening: A Clinically Focussed Evaluation of a Fine-Tuned Open-Source Model Against a Larger Frontier System

Khanna, S.; Chaudhary, R.; Narula, N.; Lee, R.

2026-04-11 oncology 10.64898/2026.04.10.26350595 medRxiv
Top 0.9%
0.8%
Show abstract

Lung cancer screening saves lives, yet uptake remains suboptimal and inequitable. Personalised communication can improve attendance and reduce anxiety, but scaling such support is a workforce challenge. We fine-tuned Googles Gemma 2 9B using QLoRA on 5,086 synthetic screening conversations and compared it against Googles Gemini 2.5 Flash (a larger frontier model) and an unmodified baseline across 300 multi-turn conversations with 100 patient personas spanning ten clinical categories. Evaluation combined automated natural language processing metrics with independent language model judgement in two complementary modes: structured clinical rubric and simulated patient persona. The fine-tuned model achieved the highest simulated patient experience score (3.71/5 vs 3.65 for the frontier model), recorded zero boundary violations after clinician review of all flagged instances, and led on the four most safety-critical categories. A composite Patient Adaptation Index showed that the fine-tuned model led overall (0.37 vs 0.35 vs 0.35), with its clearest advantage on the two clinically specific components: empathy calibration to patient distress and selective smoking cessation signposting. These findings suggest that targeted fine-tuning of open-source models can yield clinical communication quality comparable to larger proprietary systems, with advantages in safety-critical scenarios and suitability for NHS data governance constraints. Human clinician review of these conversations is ongoing.

2
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 1%
0.6%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

3
Aakhyan: An AI-Powered Vernacular Patient Communication Platform for Oncology in Resource-Limited Settings - System Architecture and Pilot Randomised Trial Protocol

Purkayastha, D. S.

2026-04-17 health informatics 10.64898/2026.04.15.26350965 medRxiv
Top 2%
0.4%
Show abstract

Inadequate discharge communication is a well-documented contributor to medication non-adherence, missed follow-ups, and preventable readmissions across healthcare systems worldwide. In resource-limited oncology settings, where patients are often low-literate, speak non-dominant languages, and manage complex multi-drug regimens, this problem is acute and largely unaddressed. We present Aakhyan, a vernacular patient communication platform that addresses the full post-discharge arc: from converting English-language discharge summaries into structured, voice-based vernacular explanations, through medication adherence support, to proactive follow-up management - all delivered via WhatsApp. The architecture is novel in its strict separation of concerns: a vision-language model performs structured JSON extraction from discharge images; all patient-facing content is generated deterministically from clinician-approved templates with community-sensitive vocabulary registers. This design eliminates the hallucination risk inherent in generative AI patient communication (documented at 18-82% in prior studies) while preserving the extraction capability of large language models. The platform supports four language registers, Bengali, Hindi, simplified English for tribal populations, and Assamese, with text-to-speech synthesis across all registers, including a custom grapheme-to-phoneme engine developed for Assamese phonology. Beyond discharge communication, the platform includes scheduled medication adherence nudges, interactive follow-up reminders, and a Daily Availability and Patient Notification System (DAPNS) that notifies patients the evening before their follow-up whether their doctor and required investigations are available, preventing wasted trips by rural patients who travel 2-6 hours to reach the centre. A 100-patient stratified randomised controlled study is planned at Silchar Cancer Centre, with structured teach-back assessment at 48-72 hours post-discharge as the primary comprehension outcome and preliminary clinical efficacy as a secondary objective. This paper describes the clinical rationale, technical architecture, safety framework, and positioning of Aakhyan within the existing literature on mHealth patient communication interventions.

4
Easily Scalable, Rapidly Deployable Mechanical Ventilator For Pandemic Health Crises In Resource-Limited Areas

Farre, R.; Salama, R.; Rodriguez-Lazaro, M. A.; Kiarostami, K.; Fernandez-Barat, L.; Oliveira, V. D. C.; Torres, A.; Farre, N.; Dinh-Xuan, A. T.; Gozal, D.; Otero, J.

2026-04-11 emergency medicine 10.64898/2026.04.08.26350386 medRxiv
Top 2%
0.3%
Show abstract

BackgroundThe COVID-19 pandemic exposed critical shortages of mechanical ventilators, particularly in low-resource settings. Disruptions in global supply chains and dependence on specialized components highlighted the need for scalable, locally manufacturing alternatives for emergency respiratory support. AimTo describe and evaluate a simplified, supply-chain-independent mechanical ventilator assembled from widely available automotive and simple hardware components, and intended as a last-resort solution. MethodsThe ventilator is based on a reciprocating air pump driven by an automotive windshield wiper motor coupled to parallel shaft bellows and readily assembled passive membrane valves, only requiring materials available from standard hardware retailers, minimal tools, and basic manual skills. Ventilator performance was assessed through bench testing using a patient model simulating severe lung disease in an adult (R=20 cmH2O{middle dot}s/L, C=15 mL/cmH2O) and pediatric (R=50 cmH2O{middle dot}s/L, C=10 mL/cmH2O) patients. Realistic proof of concept was performed in four mechanically ventilated 50-kg pigs. ResultsThe device delivered tidal volumes up to 600 mL and respiratory rates up to 45 breaths/min with PEEP up to 10 cmH2O, covering pediatric and adult ventilation ranges. In vivo testing showed that the ventilator maintained arterial blood gases within the targeted range. Technical details for ventilator construction are provided in an open-source video tutorial. DiscussionThis low-cost ventilator demonstrated adequate performance under demanding conditions. Although not a substitute for commercial intensive care ventilators, its simplicity, autonomy, and independence from fragile supply chains provide a potentially life-saving option in resource-constrained emergency scenarios.

5
Leveraging State-of-the-Art LLMs for the De-identification of Sensitive Health Information in Clinical Speech

Dai, H.-J.; Mir, T. H.; Fang, L.-C.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.

2026-04-17 health informatics 10.64898/2026.04.13.26349911 medRxiv
Top 4%
0.2%
Show abstract

Accurate recognition and deidentification of sensitive health information (SHI) in spoken dialogues requires multimodal algorithms that can understand medical language and contextual nuance. However, the recognition and deidentification risks expose sensitive health information (SHI). Additionally, the variability and complexity of medical terminology, along with the inherent biases in medical datasets, further complicate this task. This study introduces the SREDH/AI-Cup 2025 Medical Speech Sensitive Information Recognition Challenge, which focuses on two tasks: Task-1: Speech transcription systems must accurately transcribe speech into text; and Task-2: Medical speech de-identification to detect and appropriately classify mentions of SHI. The competition attracted 246 teams; top-performing systems achieved a mixed error rate (MER) of 0.1147 and a macro F1-score of 0.7103, with average MER and macro F1-score of 0.3539 and 0.2696, respectively. Results were presented at the IW-DMRN workshop in 2025. Notably, the results reveal that LLMs were prevalent across both tasks: 97.5% of teams adopted LLMs for Task 1 and 100% for Task 2. Highlighting their growing role in healthcare. Furthermore, we finetuned six models, demonstrating strong precision ([~]0.885-0.889) with slightly lower recall ([~]0.830-0.847), resulting in F1-scores of 0.857-0.867.

6
Cross-cultural adaptation and psychometric validation of the ISBAR Structured Handover Observation Tool in ICU-to-ward patient transfer

Ni, N.; Zhao, B.; Wang, Y.; Wang, Q.; Ding, J.; Liu, T.

2026-04-14 nursing 10.64898/2026.04.10.26350669 medRxiv
Top 5%
0.1%
Show abstract

Abstract The ISBAR framework is used to standardize clinical handovers and enhance patient safety. Observational tools based on ISBAR have been developed to assess the completeness of information transfer. However, these instruments have primarily been developed in non-Chinese contexts, and validated Chinese-language observational tools suitable for clinical practice remain limited. In this study, a cross-cultural adaptation and psychometric validation of the ISBAR Structured Handover Observation Tool was conducted, examining its reliability and discriminant validity in Chinese clinical settings. The study was conducted in two phases: cross-cultural adaptation and psychometric evaluation in real-world clinical settings. Content validity was assessed using the Content Validity Index (CVI), and inter-rater reliability was evaluated using the Intraclass Correlation Coefficient (ICC) based on a two-way mixed-effects model with absolute agreement. Discriminant validity was examined using the Mann-Whitney U test to compare scores across nurses with varying levels of clinical experience. A total of 233 handover cases involving patient transfers from the intensive care unit (ICU) to general wards were collected, involving 84 nurses. The scale demonstrated good content validity, with item-level content validity indices (CVI) ranging from 0.88 to 1.00 and a scale-level CVI/Ave of 0.98. The inter-rater reliability, assessed using fifty randomly selected cases, was high, with an intraclass correlation coefficient (ICC) of 0.885 for single-rater assessments and 0.939 for average-rater assessments. Discriminant validity analysis showed that nurses with more clinical experience had significantly higher total scores than those with less experience (Z = -4.772, p < 0.001). The Chinese version of the ISBAR Structured Handover Observation Tool demonstrates good content validity, high inter-rater reliability, and acceptable discriminant validity. This tool provides a standardized and practical method for assessing the completeness of information transfer and is expected to support quality improvement in patient handover from the ICU to general wards in Chinese clinical settings.

7
Breaking the seasonal barrier: feasibility of cuffless fingertip-based continuous blood pressure monitoring in older adults during winter exercise

Mizutani, N.; Nishizawa, S.; Enomoto, Y.; OKAMOTO, H.; Baba, R.; Misawa, A.; Takahashi, K.; Tada, Y.; LIN, Y.-C.; Shih, W.-P.

2026-04-16 health systems and quality improvement 10.64898/2026.04.14.26350440 medRxiv
Top 5%
0.1%
Show abstract

While the need for continuous blood pressure (BP) monitoring in Japan is high, there are no commercially available cuffless devices for personal daily monitoring use. Fingertip-based sensors are a promising alternative as they eliminate the discomfort of repeated cuff inflation. However, their reliability during winter has been a major technical limitation due to cold-induced peripheral vasoconstriction. This study aimed to address this issue by validating a novel fingertip-based continuous BP monitor used by exercising adults during summer and winter. Eleven community-dwelling older adults (mean age, 73.1 {+/-} 8.8 years) were included in this seasonal comparative study. During exercise, we compared a personal fingertip-based continuous monitor (ArteVu) with a standard oscillometric cuff device (Omron) in summer (mean, 26.5{degrees}C) and winter (mean, 7.4{degrees}C). The study also evaluated the device's accuracy during exercise-induced BP fluctuations and seasonal environmental changes. Awareness of the participants regarding BP management was also assessed using questionnaires. There were strong correlations for systolic BP (SBP) between summer and winter (r = 0.93 in summer; r = 0.88 in winter). Although the mean difference for the SBP was higher in winter than in summer (3.1 {+/-} 11.2 mmHg vs. 0.2 {+/-} 9.4 mmHg), the values remained within a clinically acceptable range for personal monitoring. Notably, 72.7% of participants reported that the ease of using the fingertip-based device significantly increased their awareness and motivation for daily BP management. This study confirms the feasibility of cuffless fingertip-based continuous BP monitoring across different seasons, including in winter. By overcoming the seasonal limitations, this device fills a critical gap in the Japanese health-monitoring market. Our findings support the development of smaller and more portable models, representing a shift from traditional "snapshot" cuff measurements to continuous and integrated lifestyle monitoring for older adults.

8
Distinct Metabolic Signatures Distinguish Lung, Colorectal and Ovarian Cancer

Tsiara, I.; Vouzaxaki, E.; Ekström, J.; Rameika, N.; Yang, F.; Jain, A.; Iglesias Alonso, A.; Sjöblom, T.; Globisch, D.

2026-04-13 oncology 10.64898/2026.04.08.26350309 medRxiv
Top 5%
0.1%
Show abstract

Cancer-related casualties are the most common cause of death worldwide. The discovery of biomarkers is of utmost importance for diagnosis and disease monitoring. Herein, we performed a comprehensive metabolomics biomarker discovery effort in plasma from 615 lung, ovarian and colorectal cancer patients at diagnosis and 95 non-cancerous control subjects. This pan-cancer investigation identified specific panels of metabolites in the entire sample cohort with a high discriminating power and demonstrated by combined ROC AUC values of up to 0.95. The identified metabolites are mainly associated with lipid and amino acid metabolism as well as xenobiotic transformation. These metabolite panels of high predictive power provide new metabolic insights in these cancers and demonstrate the potential of metabolomics for improved diagnosis and monitoring disease progression.

9
Cross-cultural adaptation and validation of the Japanese Charite Alarm Fatigue Questionnaire (CAFQa) among ICU nurses and physicians: a multicenter study

Sato, T.; Ishiseki, M.; Kataoka, Y.; Someko, H.; Sato, H.; Minami, K.; Kaneko, T.; Takeda, H.; Crosby, A.

2026-04-11 intensive care and critical care medicine 10.64898/2026.04.07.26350292 medRxiv
Top 5%
0.1%
Show abstract

ObjectivesAlarm fatigue is a patient safety concern in ICUs, yet no validated instrument exists to assess alarm fatigue among healthcare professionals in non-Western settings. This study aimed to cross-culturally adapt the Charite Alarm Fatigue Questionnaire (CAFQa) into Japanese and evaluate its reliability and validity among ICU nurses and physicians. MethodsThe Japanese CAFQa was cross-culturally adapted following the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines, including forward translation, back-translation, expert panel review, and cognitive interviews. A multicenter cross-sectional validation study was performed across eight ICUs at five hospitals in Japan. A total of 129 participants (103 nurses and 26 physicians) completed the Japanese CAFQa, the NIOSH Brief Job Stress Questionnaire, and the Insomnia Severity Index (ISI). Structural validity, internal consistency, test-retest reliability (n = 102), convergent validity, and known-groups validity were assessed. ResultsCFA confirmed the two-factor structure with acceptable fit (CFI = 0.922, RMSEA = 0.041, SRMR = 0.076), with standardized factor loadings ranging from 0.33 to 0.82. The two factors were not correlated (r = 0.05). Cronbachs alpha was 0.688 for the overall scale, 0.805 for Alarm Stress, and 0.649 for Alarm Coping. Test-retest ICCs ranged from 0.616 to 0.753. The CAFQa total score correlated with the NIOSH total (r = 0.261) and the ISI total (r = 0.338). Healthcare professionals with [&ge;]4 years of ICU experience had higher Alarm Coping scores than those with 1-3 years (median 7.0 vs 6.5), and physicians scored higher on Alarm Coping than nurses (median 8.0 vs 7.0). ConclusionsThe Japanese CAFQa demonstrated acceptable structural validity, reliability, and convergent and known-groups validity, providing the first validated tool for quantitatively measuring alarm fatigue in Japan. Implications for Clinical PracticeThe Japanese CAFQa enables ICU managers to quantify alarm fatigue at individual and unit levels, identify high-risk staff, and evaluate the effectiveness of alarm management interventions.

10
Performance of open-source large language models on nephrology self-assessment program

Ahangaran, M.; Jia, S.; Chitalia, S.; Athavale, A.; Francis, J. M.; O'Donnell, M. W.; Bavi, S. R.; Gupta, U. D.; Kolachalama, V. B.

2026-04-16 nephrology 10.64898/2026.04.16.26348910 medRxiv
Top 5%
0.1%
Show abstract

Background: Large Language Models (LLMs) have demonstrated strong performance in medical question-answering tasks, highlighting their potential for clinical decision support and medical education. However, their effectiveness in subspecialty areas such as nephrology remains underexplored. In this study, we assess the performance of open-source LLMs in answering multiple-choice questions from the Nephrology Self-Assessment Program (NephSAP) to better understand their capabilities and limitations within this specialized clinical domain. Methods: We evaluated the performance of five open-source large language models (LLMs): PodGPT which a podcast-pretrained model focused on STEMM disciplines, Llama 3.2-11B, Mistral-7B-Instruct-v0.2, Falcon3-10B-Instruct, and Gemma-2-9B-it. Each model was tested on its ability to answer multiple-choice questions derived from the NephSAP. Model performance was quantified using accuracy, defined as the proportion of correctly answered questions. In addition, the quality of the models explanatory responses was assessed using several natural language processing (NLP) metrics: Bilingual Evaluation Understudy (BLEU), Word Error Rate (WER), cosine similarity, and Flesch-Kincaid Grade Level (FKGL). For qualitative analysis, three board-certified nephrologists reviewed 40 randomly selected model responses to identify factual and clinical reasoning errors, with performance summarized as average error ratios based on the proportion of error-associated words per response. Results: Among the evaluated models, PodGPT achieved the highest accuracy (64.77%), whereas Llama showed the lowest performance with an accuracy of 45.08%. Qualitative analysis showed that PodGPT had the lowest factual error rate (0.017), while Llama and Falcon achieved the lowest reasoning error rates (0.038). Conclusions: This study highlights the importance of STEMM-based training to enhance the reasoning capabilities and reliability of LLMs in clinical contexts, supporting the development of more effective AI-driven decision-support tools in nephrology and other medical specialties.

11
Prospective Population-Scale Validation of an Electronic Health Record Based Model for Pancreatic Cancer Risk

Lahtinen, E.; Schigiltchoff, N.; Jia, K.; Kundrot, S.; Palchuk, M. B.; Warnick, J.; Chan, L.; Shigiltchoff, N.; Sawhney, M. S.; Rinard, M.; Appelbaum, L.

2026-04-13 oncology 10.64898/2026.04.11.26350318 medRxiv
Top 6%
0.1%
Show abstract

Background and aims: Pancreatic ductal adenocarcinoma (PDAC) surveillance is limited to individuals with familial or genetic risk although most future cases arise outside these groups. In a retrospective study, PRISM, an electronic health record (EHR)-based PDAC risk model, identified individuals in the general population at elevated near-term risk of PDAC. We aimed to prospectively evaluate whether PRISM can identify high-risk individuals beyond current surveillance groups across U.S. health systems. Methods: We performed a prospective multicenter cohort study after deployment of PRISM in April 2023 across 44 U.S. health care organizations. Eligible adults aged [&ge;]40 years without prior PDAC received a single baseline risk score and were assigned to prespecified risk tiers. Patients were followed for incident PDAC for 30 months. We estimated tier-specific 30-month cumulative incidence (positive predictive value, PPV), number needed to screen (NNS), standardized incidence ratios (SIRs), and time from deployment and first high-risk flag to diagnosis. Results: Among 6,282,123 adults assigned a PRISM score, 5,058,067 had follow-up; 3,609 developed PDAC. The highest-risk tier had 30-fold higher PDAC incidence than the study population. At the SIR 5 threshold, 30-month cumulative incidence was 0.35% (NNS, 284.2); at SIR 16, 1.14% (NNS, 87.4); and at SIR 30, 2.19% (NNS, 45.7). Median time from deployment to PDAC diagnosis was 9.5 months, and median time from first high-risk flag to diagnosis at SIR 5 was 3.5 years. Shapley additive explanations (SHAP) analyses supported patient- and tier-level interpretability. Conclusions: Prospective deployment of PRISM across multiple U.S. health care organizations identified individuals at elevated near-term risk for PDAC, with substantial risk enrichment and lead time before diagnosis. These findings support the real-world scalability and generalizability of EHRbased risk stratification for risk-adapted early detection. ClinicalTrials.gov identifier NCT05973331

12
Mapping the Dynamic Interplay of Mental Health and Weight Across Childhood: Data-Driven Explorations Using Causal Discovery

Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.

2026-04-17 epidemiology 10.64898/2026.04.16.26350943 medRxiv
Top 6%
0.1%
Show abstract

Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.

13
Placental fetal vascularization in neonates with congenital heart disease: a pilot retrospective case control study

Kozai, A. C.; Yoshimasu, T.; Chase, M.; Ray Chaudhuri, N.; Udassi, J. P.; Barone Gibbs, B.; Hedjazi Moghari, M.

2026-04-17 obstetrics and gynecology 10.64898/2026.04.15.26350950 medRxiv
Top 7%
0.0%
Show abstract

Background: Placental function is associated with congenital heart defects (CHD), frequently presenting with malperfusion lesions and small-for-gestational-age size. However, placental villous vasculature in the setting of CHD is understudied. This study evaluated differences in placental, neonatal, and maternal outcomes among maternal/infant dyads with versus without CHD. Methods: We conducted a gestational age- and fetal sex-matched retrospective case control study using specimens prospectively collected by a local biobank. Neonatal outcomes included birthweight, placental weight, and their ratio (placental efficiency). We estimated the proportion of placental villous tissue comprised of fetal vascular endothelial cells (%FVE) using anti-CD34 immunohistochemistry and a pixel count algorithm. Placental weight multiplied by %FVE estimated the grams of placental tissue comprised of villous vasculature (placental vascular index). Maternal outcomes included hypertensive disorders of pregnancy and gestational diabetes. We compared cases and controls using linear and logistic regression adjusted for maternal smoking and cold ischemia time. Stratified analyses examined associations by preterm birth status. Results: Dyads (n=34 with CHD, n=34 without CHD) had maternal age of 29.4 +/- 4.9 years and were 35.6 +/- 4.0 gestational weeks at delivery. Groups had similar placental, neonatal, and maternal parameters. Among preterm neonates, we observed small-to-moderate effect sizes indicating lower placental weight, %FVE, and placental vascular index, and higher placental efficiency, in CHD cases. Among term neonates, moderate effect sizes suggested lower birthweight, placental weight, and placental vascular index in CHD cases. Conclusions: Though differences between groups were not significant, moderate effect sizes suggested that placental vascularization was lower among preterm neonates with CHD.

14
Automated Detection of Dental Caries and Bone Loss on Periapical and Bitewing Radiographs using a YOLO Based Deep Learning Model

Alqaderi, H.; Kapadia, U.; Brahmbhatt, Y.; Papathanasiou, A.; Rodgers, D.; Arsenault, P.; Cardarelli, J.; Zavras, A.; Li, H.

2026-04-17 dentistry and oral medicine 10.64898/2026.04.12.26350726 medRxiv
Top 7%
0.0%
Show abstract

BackgroundDental caries and periodontal disease represent the most prevalent global oral health conditions, collectively affecting several billion people. The diagnostic interpretation of dental radiographs, a cornerstone of modern dentistry, is associated with considerable inter-observer variability. In routine clinical practice, clinicians are required to evaluate a high volume of radiographic images daily, a cognitively demanding task in which diagnostic fatigue, time constraints, and the inherent complexity of overlapping anatomical structures can lead to the inadvertent oversight of early-stage pathologies. Artificial intelligence (AI) offers a transformative opportunity to augment clinical decision-making by providing rapid, objective, and consistent radiographic analysis, thereby serving as a tireless adjunct capable of flagging findings that may be missed during routine human inspection. MethodsThis study developed and validated a deep learning system for the automated detection of dental caries and alveolar bone loss using a dataset of 1,063 periapical and bitewing radiographs. Two separate YOLOv8s object detection models were trained and evaluated using a rigorous 5-fold cross-validation methodology. To align with the clinical use-case of a screening tool where high sensitivity is paramount, a custom image-level evaluation criterion was employed: a true positive was recorded if any predicted bounding box had a Jaccard Index (IoU) > 0 with any ground truth annotation. Model performance was systematically evaluated at confidence thresholds of 0.10 and 0.05. ResultsAt a confidence threshold of 0.05, the caries detection model achieved a mean precision of 84.41% ({+/-}0.72%), recall of 85.97% ({+/-}4.72%), and an F1-score of 85.13% ({+/-}2.61%). The alveolar bone loss model demonstrated exceptionally high performance, with a mean precision of 95.47% ({+/-}0.94%), recall of 98.60% ({+/-}0.49%), and an F1-score of 97.00% ({+/-}0.46%). ConclusionThe YOLOv8-based models demonstrated high accuracy and high sensitivity for detecting dental caries and alveolar bone loss on periapical radiographs. The system shows significant potential as a reliable automated assistant for dental practitioners, helping to improve diagnostic consistency, reduce the risk of missed pathology, and ultimately enhance the standard of patient care.

15
Medicalbench: Evaluating Large Language Models Towards Improved Medical Concept Extraction

Yang, Z.; Lyng, G. D.; Batra, S. S.; Tillman, R. E.

2026-04-16 health informatics 10.64898/2026.04.12.26350704 medRxiv
Top 8%
0.0%
Show abstract

Medical concept extraction from electronic health records underpins many downstream applications, yet remains challenging because medically meaningful concepts, such as diagnoses, are frequently implied rather than explicitly stated in medical narratives. Existing benchmarks with human-annotated evidence spans underscore the importance of grounding extracted concepts in medical text. However, they predominantly focus on explicitly stated concepts and provide limited coverage of cases in which medically relevant concepts must be inferred. We present MedicalBench, a new benchmark for medical concept extraction with evidence grounding that evaluates implicit medical reasoning. MedicalBench formulates medical concept extraction as a verification task over medical note concept pairs, coupled with sentence level evidence identification. Built from MIMIC-IV discharge summaries and human verified ICD-10 codes, the dataset is curated through a multi stage large language model (LLM) triage pipeline followed by medical annotation and expert review. It deliberately includes implicit positives, semantically confusable negatives, and cases where LLM judgments disagree with medical expert assessments. Annotators provide sentence level evidence spans and concise medical rationales. The final dataset contains 823 high quality examples. We define two complementary evaluation tasks: (1) medical concept extraction and (2) sentence level evidence retrieval, enabling assessment of both correctness and interpretability. Benchmarking state-of-the-art LLMs and a supervised baseline reveals that performance remains modest, highlighting the difficulty of extracting implicitly expressed concepts. We further show that explicitly incorporating reasoning cues and prompting to extract implicit evidence substantially improves medical concept extractions, while performance is largely invariant to note length, indicating that MedicalBench isolates reasoning difficulty rather than superficial confounders. MedicalBench provides the first systematic benchmark for implicit, evidence-grounded medical concept extraction, offering a foundation for developing medical language models that can both identify medically relevant concepts and justify their predictions in a transparent and medically faithful manner.

16
Are Nutritional Aspects And Body Composition Associated With The Can Do, Do Do Concept In People With COPD In Latin America? An Observational Study

Borges, P.; Freire, A. P. F.; Pedroso, M. A.; Spolador de Alencar Silva, B.; Lima, F. F.; Uzeloto, J. S.; Gobbo, L. A.; Grigoletto, I.; Cipulo Ramos, E. M.

2026-04-15 rehabilitation medicine and physical therapy 10.64898/2026.04.13.26350788 medRxiv
Top 8%
0.0%
Show abstract

IntroductionIndividuals with COPD can be classified according to their levels of physical activity (PA) and physical capacity (PC). The relationship between nutrition and body composition within these classifications remains unclear. ObjectivesTo compare the body composition and food intake of people with COPD and verify the associations. MethodsCross-sectional exploratory analysis study in which body composition and food intake were assessed in individuals with COPD. Classification was based on six-minute walk test (PC) and accelerometry(PA): Quadrant "can do, dont do" (I-preserved PC, low PA); quadrant "can do, do do" (II-preserved PC, preserved PA). Results72 individuals with COPD, 39 in quadrant I and 33 in quadrant II, with mean ages of (69 {+/-} 6) (67 {+/-} 7), respectively. Group I had a higher proportion of males, whereas group II had a higher proportion of females. A positive trend in skeletal muscle mass (p=0.011) (B= 2.883) and a negative trend in basal metabolic rate (p=0.010) (B=-0.092) for group I. ConclusionBrazilians with COPD classified in quadrants I and II showed similar results in terms of body composition and food intake. A positive trend in skeletal muscle mass was observed for the group I. These findings align with the pathophysiological model of COPD, in which the preservation of muscle mass and adequate protein intake support functional capacity and the maintenance of higher physical activity levels.

17
Caregiver knowledge, its determinants and its association with infant and young child feeding and water, sanitation, and hygiene practices among children with severe acute malnutrition in agrarian and pastoral settings of Ethiopia

Areb, M.; Huybregts, L.; Tamiru, D.; Toure, M.; Biru, B.; Fall, T.; Haddis, A.; Belachew, T.

2026-04-13 public and global health 10.64898/2026.04.09.26350480 medRxiv
Top 8%
0.0%
Show abstract

BackgroundThis study aimed to assess caregiver knowledge of Infant and Young Child Feeding (IYCF), child health, severe acute malnutrition (SAM) screening, and Community-Based Management of Acute Malnutrition (CMAM), its determinants, and associations with IYCF/ WaSH (water, sanitation, and hygiene) practices among caregivers of children 6-59 months with SAM in Ethiopian agrarian and pastoralist settings. MethodData were from the baseline survey of the R-SWITCH Ethiopia cluster-randomized controlled trial (cRCT), which screened [~]28,000 children aged 6-59 months and identified 686 SAM cases. Caregiver knowledge was evaluated using a validated 32-item questionnaire (Cronbachs for internal reliability) and analyzed via linear mixed-effects and Poisson regression models in Stata 17. ResultsCaregiver knowledge was positively associated with improved IYCF/WaSH practices among children aged 6-23 months with SAM, including higher minimum dietary diversity (MDD: IRR=1.50), minimum acceptable diet (MAD: IRR=1.63), and reduced zero vegetable/fruit intake (IRR=0.77), as well as MDD in children aged 24-59 months, improved water access (IRR=1.19), water treatment (IRR=2.02), and handwashing stations (IRR=1.41). Literate ({beta} = 4.1; 95% CI:1.5-6.6, p= 0.016), pregnant({beta} = 4.4; 95% CI:0.9-7.8, 0.018), having child weighing at a health post/ health center ({beta} = 8.9;95% CI:3.5-14.2,p [&le;] 0.001), and higher household wealth index ({beta} = 11.8;95% CI:3.6-20.1,p= 0.005) were associated with higher knowledge, while possible depression ({beta} = -0.3;95% CI: -0.5 to 0.0, p= 0.015) was associated with lower knowledge. ConclusionCaregiver knowledge determines better IYCF/WaSH practices among children aged 6-59 months with SAM. Literacy, pregnancy, having child weighing at a health post or health center, and greater household wealth were associated with caregivers knowledge, whereas possible depression was associated with lower knowledge. Integrating context-specific caregiver education and mental health support into CMAM, GMP(Growth monitoring and promotion), and primary care services could enhance feeding/WaSH practices in Ethiopia.

18
The impact of non-invasive prehabilitation before surgery on emotional well-being in neuro-oncology patients: Insights from the Prehabilita project

Brault-Boixader, N.; Roca-Ventura, A.; Delgado-Gallen, S.; Buloz-Osorio, E.; Perellon-Alfonso, R.; Hung Au, C.; Bartres-Faz, D.; Pascual-Leone, A.; Tormos Munoz, J. M.; Abellaneda-Perez, K.; Prehabilita Working Group,

2026-04-12 oncology 10.64898/2026.04.08.26350382 medRxiv
Top 8%
0.0%
Show abstract

Prehabilitation (PRH) is a preoperative process aimed at optimizing patients functional capacity to improve surgical outcomes and overall well-being. While its physical and cognitive benefits are increasingly documented, its emotional impact, particularly in neuro-oncology patients, remains less explored. This study assessed the psychological effects of a PRH program on 29 brain tumor patients. The primary outcome, emotional well-being, was measured using quality of life and emotional distress metrices. Secondary outcomes included perceived stress levels and control attitudes. Additionally, qualitative data from structured interviews provided further insights into the psychological effects of the intervention. The results indicated significant improvements in quality of life and reductions in emotional distress, particularly among women. While perceived stress levels remained stable, control attitudes showed an increase. Qualitative analysis further highlighted the positive changes in the control sense and identified additional factors, such as the importance of social support sources during the PRH process. Overall, these findings suggest that PRH interventions play a significant role in enhancing emotional well-being among neuro-oncological patients in the preoperative phase. These results underscore the importance of implementing comprehensive and personalized PRH approaches to optimize clinical status both before and after surgery, thereby promoting sustained psychological benefits in this population. This study is based on data collected at Institut Guttmann in Barcelona in the context of the Prehabilita project (ClinicalTrials.gov identifier: NCT05844605; registration date: 06/05/2023).

19
Prevalence and Factors Associated with Family-Based HIV Index Case Testing in Wolaita Zone, Southern Ethiopia, 2023: A Cross-Sectional Study

Koyra, A. B.; Mohammed, F.; Eshete, T.

2026-04-11 epidemiology 10.64898/2026.04.08.26350444 medRxiv
Top 8%
0.0%
Show abstract

BackgroundFamily-based HIV index case testing identifies family members with unknown HIV status and links them to care. Data are limited in southern Ethiopia. MethodsA facility-based cross-sectional study was conducted among 377 adults on antiretroviral therapy (ART) in Wolaita Zone, Southern Ethiopia, from November 2022 to May 2023. Participants were selected using systematic random sampling. Data were collected via interviewer-administered semi-structured questionnaire. Multivariable logistic regression identified factors associated with index case family testing. Adjusted odds ratios (AOR) with 95% confidence intervals (CI) were calculated, and statistical significance was declared at p < 0.05. ResultsThe proportion of index case family testing for HIV was 84.9% (95% CI: 81.2- 88.6). In multivariable analysis, urban residence (AOR = 2.8; 95% CI: 1.16-6.75), duration on ART greater than 12 months (AOR = 13.0; 95% CI: 4.6-36.9), disclosure of HIV status to family members (AOR = 5.6; 95% CI: 1.9-16.5), discussion of HIV status with family members (AOR = 6.6; 95% CI: 1.9-23.2), and being counselled by health professionals to bring families for testing (AOR = 6.3; 95% CI: 2.1-19.0) were significantly associated with index case family testing. ConclusionThe prevalence of family-based HIV index case testing in Wolaita Zone was 84.9%, below the national 95% target. Health professionals should strengthen counselling on ART adherence, status disclosure, family discussion, and active referral to improve testing uptake among family members of people living with HIV.

20
Comparative LUSZ Therapeutic Study (LUSZ_AVIST) of Antiviral, Antiretroviral, and Immunosuppressive Treatments in Hospitalized COVID-19 Patients with High-Risk Factors, Biomarkers, and Disease Progression.

Makdissy, N.; Makdessi, E. W.; Fenianos, F.; Nasreddine, N.; Daher, W.; El Hamoui, S.

2026-04-13 respiratory medicine 10.64898/2026.04.10.26350587 medRxiv
Top 8%
0.0%
Show abstract

COVID-19 has spread rapidly and caused a global pandemic making it one of the deadliest in history. Early identification of patients with coronavirus disease 2019 who may develop critical illness is of immense importance. Therefore, novel biomarkers were needed to identify patients who will suffer rapid disease progression to severe complications and death. Many treatments were adopted including the antiviral Remdesivir, the antiretroviral Lopinavir /Ritonavir and Tocilizumab. Our study aimed not only to specify high-risk factors and biomarkers of fatal outcome in hospitalized subjects with coronavirus but also to compare the efficacy of the three considered treatments to help clinicians better choose a therapeutic strategy and reduce mortality. We divided the population (n=711) into four main groups based according to the WHO ordinal severity scale. The percentage of mortality, in and out the hospital, the length of stay in the hospital, the pulmonary inflammatory lesion and its distribution, the SARS-CoV-2 IgM and IgG variations at admission, the inflammatory markers, the complete blood count, the coagulation factors and enzymes, proteins and electrolytes profile, glucose and lipid profile, and other relevant markers were measured. The significance of the observed variation was assessed by multivariate and ANOVA analyses. We succeeded to establish a novel predictive scoring model of disease progression based on a cohort of Lebanese hospitalized patients relying on the pulmonary inflammatory lesions, inflammation biomarkers such as LDH, D-Dimer, CRP, IL-6 and the lymphocyte count, the number of comorbidities and the age of the patient which all were significantly correlated with the illness severity showing best outcomes with immunomodulatory and anticoagulant treatments by the results. As top tier, Tocilizumab was more efficient than the two other treatments in non-severe cases but none of the used treatments was insanely effective alone to reduce mortality in severe cases.