Back

Mathematics

MDPI AG

Preprints posted in the last 7 days, ranked by how well they match Mathematics's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Identification of a Fractional Model for an Outbreak of the Dengue Fever

Cresson, J.; Pere, M.; Szafranska, A.

2026-05-27 epidemiology 10.64898/2026.05.26.26354120 medRxiv
Top 0.1%
2.0%
Show abstract

This work focuses on the global and partial identification problem for fractional differential equations. We provide a general numerical procedure based on global and local optimization algorithms with two refinements for biological systems that ensure solution positivity and homogeneous parameter units. The method is applied to a new fractional model of Dengue outbreak called the Fractional Homogeneous Nishiura (FHN) model, calibrated using data of newly infected people in Cape Verde. We show that our identification method yields a better fit between data and model solutions than previous approaches and that our FHN model captures the dynamics of Dengue more closely than existing systems.

2
A Consensus-Driven Stacking Ensemble Framework for Interpretable Cardiovascular Risk Prediction and Clinical Deployment

Sozol, S. S.; Dev Nath, B. C.; Fahim, F. M. S.; Suzana, N. N.; Mirza, J. F.; Ahmmed, S.; Zohra, F.-T.; Zafr, A. H. A.; Uddin, M. N.; Mondal, M. R. H.; Hoque, A. S. M. L.

2026-05-26 health informatics 10.64898/2026.05.18.26352989 medRxiv
Top 0.9%
0.3%
Show abstract

Machine learning (ML) is being considered to help diagnose cardiovascular diseases (CVD). Still, challenges like inconsistent and limited datasets, limited infrastructure, and global inequalities lead to the need for a reliable and practicable ML solution. This paper presents an ML-driven framework for predicting CVD risk scores and classifying status. Several data preprocessing techniques, including multiple imputation by chained equations (MICE), outlier removal, are considered. In addition, hyperparameter tuning is performed with the GridSearchCV tuning technique. Moreover, a consensus-driven five-feature selection method is applied to identify optimal predictors. The dataset used in this study contains healthcare records related to future CVD risk scores, comprising 1,529 patient records with 22 features. The optimized stacked ensemble model is applied to the dataset and achieves a cross-validated coefficient of determination value of 98.13% for CVD risk score regression. Comparative evaluation with other ML models confirmed improved accuracy, efficiency, and interpretability. The explainable AI technique SHAP is applied to interpret predictions and highlight key risk factors. Moreover, a deployment-ready web platform with multi-role access has been developed that demonstrates clinical applicability. The proposed framework offers a reliable and interpretable tool for early detection of CVD and personalized risk assessment. In the future, this work can be extended to integrate longitudinal data, medical imaging, and deep learning to improve generalizability and strengthen real-world impact.

3
Bridging Acoustic and Semantic Spaces for Interpretable Voice Scoring via Zero-Shot Semantic Expansion

Hsiao, C.; Cheng, Y.-R.; Yang, C.-Y.; Hsu, F.-S.

2026-06-01 health informatics 10.64898/2026.05.29.26354442 medRxiv
Top 1%
0.2%
Show abstract

Subjective auditory-perceptual evaluation and uninterpretable deep learning models limit the clinical assessment of voice disorders. This study proposes a two-phase zero-shot framework to evaluate voice pathology. First, an Audio Spectrogram Transformer is fine-tuned on the Perceptual Voice Quality Database to generate an acoustic latent space. Second, Orthogonal Procrustes analysis maps these acoustic embeddings directly onto the semantic space of a pre-trained Sentence Transformer. The geometric alignment produced continuous semantic axes that outperformed a supervised machine learning baseline in regressing clinician-rated GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) severity scales. Furthermore, these axes correlate with traditional acoustic measures, including Harmonics-to-Noise Ratio and local jitter, while remaining robust when applied to aperiodic signals by not requiring fundamental frequency extraction. Most importantly, the model achieved zero-shot semantic expansion, successfully evaluating voices using an untrained, natural clinical vocabulary beyond the GRBAS scale. External validation on the Voice ICarus Database confirmed cross-corpus stability and demonstrated the capacity for zero-shot differential phenotyping of specific etiologies, such as hypokinetic dysphonia and reflux laryngitis. By bridging acoustic and semantic latent spaces, this framework offers an objective, continuous, and transparent metric for evaluating voice quality using voice descriptive vocabulary.

4
The Verification Gap: Artificial Intelligence Adoption, Hallucination Awareness, and Verification Practices Among Early Career Medical Researchers in Pakistan

Sajjad, M.

2026-05-30 health informatics 10.64898/2026.05.28.26354373 medRxiv
Top 1%
0.2%
Show abstract

Artificial intelligence (AI) tools have been rapidly adopted by medical researchers, yet whether early career researchers in low and middle income countries possess the awareness and habits needed to use these tools safely remains poorly documented. This study characterized AI adoption patterns, hallucination awareness, and verification and disclosure practices among early career medical researchers in Pakistan. A cross sectional anonymous online survey was conducted among medical students, house officers, residents, physicians, and faculty involved in research or academic work across Pakistan (May 2026). Descriptive statistics and chi square tests were applied to 373 eligible responses. AI use was near universal (99.7%), with 60.3% using AI tools daily. The most commonly reported tool in this sample was Claude (40.5%), followed by ChatGPT (29.2%) and Perplexity (26.0%), though this ranking likely reflects sampling characteristics. Despite high adoption, 59.2% typically did not verify AI outputs before use, and 40.2% had never heard that AI can generate fabricated scientific references. In behavioral vignettes, 36.5% assumed convincing AI generated references were authentic, and 54.2% would continue using remaining AI content after discovering one fabricated reference. Formal research training was strongly associated with consistent disclosure (51.7% vs. 17.1%; chi square=48.43, p less than 0.001). Role, daily use frequency, and research training were not significantly associated with verification behavior. Early career medical researchers in Pakistan demonstrate high AI adoption alongside incomplete hallucination awareness and infrequent verification, a pattern that may carry implications for research integrity. Formal training was the only factor significantly associated with consistent disclosure. Integration of AI literacy into medical curricula and institutional governance frameworks merits consideration.

5
Multi-Agent AI for Chest Radiography: A Sequential Segmentation and LLM-Driven Consultative Tool for Medical Training

Kurt, F.; Subasi, A.

2026-06-01 health informatics 10.64898/2026.05.29.26354432 medRxiv
Top 2%
0.2%
Show abstract

Background: Traditional diagnostic models lack explainability, while multimodal language models prone to hallucination remain unsafe for medical education. An interactive, risk-free artificial intelligence framework is required to serve as a reliable clinical mentor for radiology trainees. Methods: We propose a multi-agent architecture decoupling deterministic image analysis from generative consultation. Specialized computer vision models perform anatomical localization and pathological segmentation. These quantitative outputs are synthesized into a structured payload, which grounds a locally hosted large language model (LLaVA 7B) using strict prompt guardrails and prerequisite protocols. Results: The system effectively eliminates visual hallucinations by intercepting unanchored queries. The artificial intelligence tutor successfully contextualizes spatial anomalies and baseline metrics, generating accurate conversational explanations and formally structured radiology reports while strictly enforcing medical safety disclaimers. Discussion and Conclusion: By anchoring language generation exclusively to verified algorithmic realities, this framework transforms opaque diagnostic models into safe, interactive educational simulators. This establishes a highly reliable paradigm for integrating explainable artificial intelligence into medical training.

6
SeGA-GNN: Semantically Gated Augmented Graph Neural Networks for Wearable-Based Emotion Detection

Kurt, F.; Subasi, S. N.; Yakisan, E. S.; Subasi, A.

2026-06-01 health informatics 10.64898/2026.05.29.26354434 medRxiv
Top 2%
0.1%
Show abstract

Background: Wearable technologies enable scalable and continuous monitoring of emotional states through passive sensing of physiological and behavioral signals. However, conventional learning approaches often struggle to model the complex temporal, contextual, and relational dependencies underlying human emotions. To address these limitations, we propose a graph-based framework that represents multimodal wearable observations as heterogeneous knowledge graphs enriched with semantic information derived from Large Language Models (LLMs), enabling richer contextual understanding beyond raw sensor measurements. Methods: We constructed a heterogeneous knowledge graph using multimodal Fitbit physiological signals and affective self-report data collected from 45 users. Framing mood prediction and emotion detection was formulated as both binary and ternary node classification tasks. We evaluated five baseline heterogeneous Graph Neural Network (GNN) architectures and compared them with the proposed Semantically Gated Augmented Graph Neural Network (SeGA-GNN) framework, which dynamically integrates LLM-generated semantic embeddings into graph representations through a gated cross-modal fusion mechanism. Results: The baseline GNN models achieved strong performance, with classification accuracies ranging from 0.7525 to 0.9739 for binary classification and 0.6249 to 0.9699 for ternary classification. The proposed SeGA framework consistently improved predictive performance across most architectures. In particular, semantic augmentation transformed the HAN model from moderate baseline performance into near-perfect emotion recognition capability, achieving SeGA-HAN Accuracy = 0.9988 and AUC = 1.0000 for binary classification and Accuracy = 0.9979 and AUC = 1.0000 for ternary classification. Discussion and Conclusion: Integrating LLM-derived semantic contextualization into heterogeneous graph learning enables effective modeling of contextual information that is not directly captured by wearable physiological signals alone. The proposed SeGA-GNN framework demonstrates that adaptive semantic fusion substantially improves the accuracy, robustness, and interpretability of wearable-based emotion detection. These findings establish a promising direction for next-generation wearable affective computing systems and intelligent emotion-aware applications.

7
Future Pandemics: AI-Designed Diagnostic Assays for Detection of Andes Orthohantavirus (ANDV) Associated with the 2026 MV Hondius Outbreak

MacSharry, J.; Tonda, A.; Lopez-Rincon, A.

2026-05-27 health informatics 10.64898/2026.05.26.26354101 medRxiv
Top 2%
0.1%
Show abstract

Andes orthohantavirus (ANDV), the primary etiological agent of hantavirus pulmonary syndrome (HPS) in South America, is uniquely capable of limited human-to-human transmission, posing a significant challenge for outbreak control. Recent events, including the 2018-2019 Epuyen outbreak and the 2026 MV Hondius incident, underscore the need for rapid, lineage-specific molecular diagnostics. In this study, we present an artificial intelligence (AI)-driven framework for the design of diagnostic primers targeting the S genomic segment of the Epuyen lineage. Using an evolutionary algorithm integrated with thermodynamic evaluation via Primer3Plus, candidate primers were optimized to maximize classification accuracy while satisfying stringent biochemical constraints. The resulting primer set enables amplification of lineage-specific regions suitable for molecular characterization and surveillance. In silico validation demonstrates that the proposed primers achieve perfect discrimination between 2026 outbreak sequences and other ANDV variants. Furthermore, in silico comparison with standard protocol-based primers reveals substantially reduced sensitivity and specificity in the latter, highlighting the limitations of static diagnostic designs when applied to evolving viral populations. Overall, this work demonstrates that AI-assisted primer design provides a robust and adaptable strategy to improve viral detection, enhance outbreak tracking, and support timely public health interventions. Integrating computational optimization into diagnostic development is essential for strengthening preparedness against emerging zoonotic threats.

8
VOGeo-Gaze: Calibration-Free, Geometry-Aware Deep Learning for Real-Time Gaze Tracking in Clinical Video-Oculography

Zhao, J.; Ahmadi, S.-A.; Decker, J.; Zwergal, A.; Eulenburg, P. z.; Flanagin, V. L.; Wuehr, M.

2026-05-29 health informatics 10.64898/2026.05.27.26354254 medRxiv
Top 4%
0.0%
Show abstract

Quantitative eye movement analysis is important for neuro- logical diagnostics, yet existing video-oculography (VOG) systems typ- ically require calibration, device-specific settings, or accurate gaze la- bels. We present VOGeo-Gaze, a real-time, calibration-free, geometry- aware neural network that estimates gaze by reconstructing anatomi- cally meaningful eyeball parameters from image features. The method combines segmentation-driven projection geometry, a refraction-aware pupil correction module, and temporal anatomical stabilization, so gaze is derived from interpretable eye geometry rather than direct angular regression. Trained only on the public TEyeD dataset with weak gaze supervision, VOGeo-Gaze was evaluated on 116 clinical recordings from 17 patients and 19 healthy subjects using EyeSeeCam, a clinical gold- standard VOG system. It achieved median absolute angular errors of 0.33{whitebullet} horizontally and 0.35{whitebullet} vertically, with nearly 92% of recordings below 1{whitebullet} error while operating at >300 FPS. These results demonstrate sub-degree clinical gaze estimation without subject-specific calibration, camera intrinsics, or accurate gaze labels, providing a scalable and inter- pretable alternative to conventional VOG pipelines. Code is available at https://github.com/DSGZ-MotionLab/VOGeo-Gaze.

9
Towards A Foundation Model for Clinical Voice Biomarkers

Elemento, O.; Sigaras, A.; Colonel, J.; Hajirasouliha, I.; Ghosh, S.; Bensoussan, Y.; Bridge2AI-Voice Consortium, ; Rameau, A.

2026-05-30 health informatics 10.64898/2026.05.28.26354346 medRxiv
Top 4%
0.0%
Show abstract

Vocal biomarkers, encompassing voice and speech, have largely been developed for individual conditions in isolation, limiting their generalizability across diseases and recording settings. To address this, we introduce VoiceFM, a contrastive model that learns general-purpose clinical voice representations by aligning audio embeddings with rich clinical metadata. Using the Bridge2AI-Voice dataset (984 primarily English-speaking adult participants, 846 used for training and 138 held out as a temporally separated validation cohort, 40,056 recordings totaling 176 hours across 5 academic medical centers), VoiceFM pairs a fine-tuned Whisper large-v2 encoder with a tabular transformer over 44 clinical features via symmetric InfoNCE loss. Linear probes on frozen VoiceFM embeddings achieve mean AUROC 0.952 +/- 0.005 across five evaluation tasks (control vs disease screening plus four disease categories), significantly outperforming Frozen Whisper (0.926 +/- 0.013, p = 0.013), Frozen HuBERT (0.885 +/- 0.017, p = 0.0009), and the contrastively trained VoiceFM-HuBERT (0.938 +/- 0.006, p = 0.012). On the 138-participant held-out cohort, VoiceFM-Whisper achieves AUROCs of 0.99 for Alzheimer's/dementia/MCI and 0.89 for airway stenosis, demonstrating that the learned representations generalize to participants the model has never seen. VoiceFM representations transfer to three external datasets without retraining and improve few-shot classification. Recording task attribution identifies a small set of speech tasks that match or exceed the full battery's performance, suggesting shorter screening protocols are feasible. Trained predominantly on English audio, VoiceFM transfers without fine-tuning to Spanish-language Parkinson's disease (PD) detection (NeuroVoz, 107 participants, AUROC 0.93 +/- 0.02), with the signal dominated by articulatory rather than phonatory features. A fine-tuned classifier achieves participant-level AUROC 0.87 (sustained 0.85, countdown 0.80) on the mPower smartphone study (585 held-out participants). Together, these results show that contrastive alignment between voice and rich clinical metadata can serve as the basis for a clinical voice foundation model, producing a single set of transferable representations that generalize across diseases, languages, recording conditions, and patients enrolled after model freeze.

10
Generation and Evaluation of Realistic Synthetic Clinical Progress Notes for Prostate Cancer using Large Language Models.

Rey-Blanes, A.; Veredas-Morente, J.; Vivas-Vargas, E.; Gil-Garcia, F.; Moreno-Barea, F. J.; Veredas, F. J.

2026-05-28 health informatics 10.64898/2026.05.25.26354027 medRxiv
Top 4%
0.0%
Show abstract

Background and Objective: Access to real-world electronic health records (EHRs) remains limited by privacy, governance and annotation constraints, hindering the development of clinical natural language processing models. Realistic synthetic progress notes may provide EHR-like corpora that preserve clinically rigorous information on diagnoses, treatments, symptoms, imaging, laboratory findings and therapeutic trajectories without relying directly on sensitive patient records. This study evaluates whether large language models (LLMs) can generate realistic Spanish prostate cancer progress notes from published case reports, preserving clinical content, temporality and hospital-style conventions.

11
Changes in the profile of adults diagnosed as autistic since 2010: population based studies in England and Sweden

Sadik, A.; Lundberg, M.; Khandaker, G. M.; Pardinas, A. F.; Lee, B. K.; Madley-Dowd, P.; Magnusson, C.; Rai, D.

2026-05-28 epidemiology 10.64898/2026.05.20.26353486 medRxiv
Top 5%
0.0%
Show abstract

Objective: To understand if sociodemographic and neuropsychiatric characteristics of people diagnosed with autism in the United Kingdom (UK) and Sweden have changed since 2010. Design: Cross-context population-based cohort studies. Setting: UK primary care records from 2010-2023 and Swedish population-wide register linkages from 2010-2021 Participants: 24,537,039 individuals age 16 or over, registered with general practices in the UK, including 141,119 with an autism diagnosis. 9,096,874 people age 16 or over in the Swedish Total Population Register, including over 100,817 with an autism diagnosis. Main outcome measures: Annual age-standardised incidence and prevalence of adult autism diagnoses within different sociodemographic groups. Annual age-standardised proportion of adults with new autism diagnoses, lifetime autism diagnoses, and no autism diagnoses, with prior records of other neuropsychiatric conditions or medications. Results: Incident adult autism diagnoses were consistently higher in Sweden than the UK, however incidence increased rapidly in the UK after 2020. Incident diagnoses increased fastest for 16-25-year-olds and females in both nations, as well as people in White ethnic groups in the UK and people with Swedish-born parents in Sweden. For example, in the UK in 2023 the age-standardised incidence of autism diagnoses among 16-65 years olds was 11 diagnoses per 10,000 person-years (95%CI: 10.7, 11.3) in the White ethnic group and 2.2 diagnoses per 10,000 person-years (95%CI: 1.9, 2.5) in the South Asian ethnic group. Over time there has been a consistent decline in the proportion of autistic adults with a prior diagnosis of epilepsy, psychosis and intellectual disability and an increase in the proportion with a prior diagnosis of ADHD, anxiety, depression and several other mental illnesses. For example, in the UK between 2010 and 2023 the age-standardised proportions of newly diagnosed autistic adults with prior records of epilepsy decreased from 10% (95%CI: 7.6, 13) to 4% (95%CI: 3.6, 4.5), while the proportion with records of anxiety increased from 28.7% (95%CI: 24.4, 33.6) to 58.3% (95%CI: 56.6, 60.1). Mental health conditions were generally more common in females and the reduction over time in intellectual disability was greater in females than males. Conclusions: The socio-demographic and neuro-psychiatric characteristics of individuals diagnosed as autistic have changed dramatically since 2010, a phenomenon observed both in the UK and Sweden. The extent to which these changes indicate nuanced recognition of autism or broadening of diagnostic practice needs investigation.

12
Core Components for Emergency Medical Dispatch Systems: An International Delphi Consensus Study

Weber, K.; Stassen, W.; Jayaraman, S.; Odland, M. L.; Nishimwe, A.; Welgama, I.; Wallis, L.; Ignatowicz, A.; Davies, J. P.

2026-05-28 emergency medicine 10.64898/2026.05.26.26354117 medRxiv
Top 5%
0.0%
Show abstract

Introduction -- Emergency Medical Dispatch Systems (EMDS) can reduce delays in accessing emergency care by providing structured communication, triage, and coordination. However, such systems remain absent or underdeveloped in most low- or middle-income countries (LMICs). This study aimed to establish international consensus on essential EMDS components to inform global guidance. Methods -- We convened a multidisciplinary expert group to draft a preliminary list of essential components for three EMDS levels reflecting resource availability and system maturity. We then conducted a three-round Delphi with international experts to reach consensus on core EMDS components. Components which had [≥]75% agreement were included, those with [≥]75% disagreement were excluded. Components not achieving consensus by Round 3 were removed. Results were analysed overall and stratified by respondents' country income level. A subsequent online expert meeting resolved inconsistencies and finalised the component list. Results -- The expert group generated 111 components for each of three EMDS levels (Foundational, Emerging, and Established) spanning 11 operational domains. Of the 68 experts invited to the Delphi, 43 participated in Round 1 and 30 in Round 3. Across all Delphi rounds, 289 components reached consensus for inclusion. The consensus resulted in a final list of 227 components (63 Foundational, 84 Emerging, and 80 Established). Consensus agreement clustered around core EMDS domains including communication, structured call-taking and prioritisation, advice-giving, resource dispatch and tracking, and foundational governance and data functions, whereas items showing either non-consensus or consensus disagreement were typically technology-dependent or context-specific. Conclusions -- This international consensus offers guidance for EMDS development across diverse resource settings and provides a scalable roadmap to strengthen emergency care systems.

13
Optical coherence tomography as a biomarker for frontotemporal dementia: a systematic review & meta-analysis

Wang, E.; Kohli, A.; Taha, H. B.

2026-05-27 neurology 10.64898/2026.05.19.26353366 medRxiv
Top 5%
0.0%
Show abstract

Background: Frontotemporal dementia (FTD) lacks widely accessible disease-specific biomarkers. Optical coherence tomography (OCT) and OCT angiography (OCTA) may provide non-invasive measures of retinal changes associated with neurodegeneration. We conducted a systematic review and meta-analysis evaluating retinal biomarkers in FTD compared with Alzheimer disease (AD) and controls. Methods: A systematic search of PubMed and Embase was conducted through April 25, 2026 according to PRISMA guidelines. Studies evaluating OCT/OCTA biomarkers in FTD with comparator groups were included. Inverse weighted random-effects models, publication bias assessments, and meta-regressions were performed. Results: Ten studies involving 139 individuals with FTD, 87 with AD, 29 with mild cognitive impairment, 14 with TDP-43 proteinopathy, 5 with tauopathy, and 255 controls were included in the systematic review; five studies were eligible for meta-analysis. Compared with AD, individuals with FTD demonstrated significantly thinner retinal nerve fiber layer (RNFL) thickness (SMD = -0.61, 95% CI -0.98, -0.24). Compared with controls, individuals with FTD exhibited significantly thinner ganglion cell layer-inner plexiform layer (GCL-IPL) thickness (SMD = -0.55, 95% CI -1.02, -0.08), whereas pooled analyses across multiple retinal biomarkers were non-significant (SMD = -0.19, 95% CI -0.52, 0.14). RNFL thickness correlated negatively with female % in FTD and positively with age in both AD and controls. Conclusions: Individuals with FTD exhibit lower RNFL thickness than AD and lower GCL-IPL thickness than controls, suggesting retinal alterations may reflect neurodegeneration. However, larger longitudinal studies with standardized OCT/OCTA protocols are needed to determine the diagnostic and prognostic utility of retinal biomarkers in FTD

14
Vaginal Antisepsis for Major Gynecologic Surgeries Using Chlorhexidine Gluconate versus Povidone Iodine: A Systematic Review and Meta-Analysis

Dias, Y.; Gebrekidan, F.; Lowder, J.; Sutcliffe, S.; Yaeger, L.

2026-05-27 obstetrics and gynecology 10.64898/2026.05.26.26353429 medRxiv
Top 5%
0.0%
Show abstract

ABSTRACT OBJECTIVE: We performed a systematic review and meta-analysis (SRMA) of post-surgical outcomes, comparing chlorhexidine gluconate (CHG) versus povidone iodine (PI) for vaginal antisepsis of major gynecologic procedures. DATA SOURCES: Ovid Medline, Embase, Scopus, Embase, Cochrane, and Clinicaltrials.gov were searched between 1986 and December 2023, for studies comparing CHG with PI for vaginal antisepsis of major gynecologic operations. STUDY ELIGIBILITY CRITERIA: We included Randomized Controlled Trials (RCTs) and non-RCTs comparing CHG to PI for vaginal antisepsis of major gynecologic operations. The primary outcome was surgical site infections (SSIs) and the secondary outcome was urinary tract infections (UTIs) and vaginal irritation. METHODS: Summary estimates were calculated by fixed effects models when I2 [≤] 25% and by random effects models when I2 > 25%. Statistical analysis was performed using RevMan 5.4.1. The protocol for this systematic review was registered on PROSPERO (ID CRD42022378101). RESULTS: Nine studies met the inclusion criteria, four of which were randomized controlled trials (RCTs). 9538 patients were included, 4300 (45%) of whom were allocated to CHG and 5238 (55%) to PI. No statistically significant difference in SSI incidence was found for vaginal antisepsis with CHG versus PI in pooled analyses (n= 9538 patients; RR 1.20; 95% CI 0.92-1.57; I2 =0%). In contrast, a significantly higher risk of UTIs was observed for vaginal antisepsis with CHG than with PI (n=6061 patients; RR 1.48 95% CI 1.03-2.14; I2 = 0%). CONCLUSION: In our SRMA, there were no significant differences in SSI risk when either CHG or PI was utilized for antiseptic vaginal preparation. Interestingly, vaginal antisepsis with PI was associated with a lower incidence of post-operative UTIs following major gynecologic surgery. Our findings support current guidelines that form of vaginal antisepsis can be used for SSI prevention. They also suggest that PI may result in fewer postoperative UTIs but further randomized studies are needed to support these findings. Key words: surgical site infection, surgical wound infection, urinary tract infection, urogynecologic surgery, Chlorhexidine, Povidone Iodine, surgical antiseptic,

15
An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv
Top 5%
0.0%
Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [&le;] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [&le;] 40%. After fine-tuning on less than 10% of external data, LVEF [&le;] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

16
Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv
Top 5%
0.0%
Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.

17
Personalized Brain-Based Analgesia Detection with Portable fNIRS and AI

Minoccheri, C.; Joo, P.; Hu, X.-S.; Affendi, H.; Elayyan, F.; Harville, A.; McDonald, N. J.; Botero, T.; DaSilva, A. F.

2026-05-28 dentistry and oral medicine 10.64898/2026.05.20.26353377 medRxiv
Top 5%
0.0%
Show abstract

Neuroimaging based pain decoding faces two underappreciated challenges: between subject variability that prevents classifiers from generalizing across patients, and within session cross validation designs that inflate reported accuracy by conflating within person and between person variance. Here we address both using portable functional near infrared spectroscopy (fNIRS) during pharmacologically verified local nerve anesthesia. Twentyfive patients with clinically painful teeth underwent 36 channel bilateral fNIRS during percussion before ("Pre") and after ("Post") local nerve anesthesia. In 13 block-success patients, a paired Pre versus Post comparison with healthy tooth control identified three temporal hemodynamic response function (HRF) features (late slope, mean first derivative, and baseline normalized amplitude) whose analgesia interaction effects (d = 0.63 to 0.79) exceeded that of raw general linear model (GLM) amplitude (d = 0.56), with a significant difference-in-differences interaction (p = 0.011). Per-patient calibration with these features yielded leave one subject out (LOSO) AUC = 0.68 to 0.76 for nonlinear classifiers (permutation p = 0.002), with HbO-specific feature selection achieving the best performance (RF AUC = 0.760); a healthy tooth negative control was non-significant. End to end deep learning on raw time series (CNN LSTM AUC = 0.719) was competitive with feature based classifiers, while linear models did not reach significance. Critically, head to head comparison of within-session CV and LOSO on the same data revealed mean inflation of +0.13 AUC across all model types, including deep learning, demonstrating that high within session accuracy alone does not establish subject-independent validity. Exploratory analyses suggested complementary roles for oxyhemoglobin (HbO; within patient analgesia detection) and deoxyhemoglobin (HbR; cross patient information), and that trial to trial response variability may complement amplitude for cross patient pain detection. These results show that per patient calibration with temporal HRF features supports subject independent analgesic-state detection under strict LOSO evaluation, and that within-session validation (standard in the fNIRS pain- decoding literature) can substantially overestimate performance.

18
Morphological feature remodeling of intracranial arteries in the context of inflammation and HIV-associated cognitive impairment

Hoang, N.; Yang, H.; Uddin, M. N.; Zhong, J.; Faiyaz, A.; Singh, M. V.; Boodoo, Z. D.; Sutton, K. R.; Wang, H. Z.; Sahin, B.; Khan, M. W.; Weber, M. T.; Yuan, C.; Chen, L.; Schifitto, G.

2026-05-27 hiv aids 10.64898/2026.05.19.26353071 medRxiv
Top 5%
0.0%
Show abstract

Background: Despite the success of combination antiretroviral therapy (cART), vascular comorbidities, including cerebrovascular disease, are more prominent in people living with HIV (PLWH) compared to people without HIV (PWOH). However, quantitative assessments of cerebrovascular morphometry and their associations with cognitive outcomes in the context of HIV are still limited. In this study, we explore this missing link. Methods: Magnetic Resonance Angiography (MRA) data, blood markers, and neurocognitive assessments were collected from 73 PWOH subjects (male: 57, female: 16; age: 53 {+/-} 16) and 99 PLWH subjects (male: 66, female: 30, age: 53 {+/-} 11). Vessel morphometric features were quantified using intraCranial Artery Feature Extraction (iCafe) to investigate associations between vessel morphometry, markers of monocytes, endothelial cell activation, and cognitive performance. Results: HIV status predicted a lower total number of branches ({beta} = -0.224, p = 0.001, d = -0.517) and shorter total distal length ({beta} = -0.173, p = 0.021, d = -0.370) with a moderate effect size. Total branch number was found to be negatively associated with plasma levels of monocyte markers (sCD14: r = -0.167, p = 0.033; sCD163: r = -0.157, p = 0.045) and positively correlated with white matter cerebral blood flow (r = 0.550; p [&le;] 0.05). HIV status was the strongest predictor of overall cognitive performance in ANCOVA model ({beta} = -0.219, p = 0.006, d = -0.453). Conclusions: Our results suggest that cognitive impairment in PLWH is associated with vessel morphology metrics. Monocyte immune activation may contribute to changes in vessel morphology.

19
Can Large Language Models Diagnose Primary Immunodeficiency from Patient-Described Symptoms?

Reteig, L. C.; Woloshin, S.; Maglione, P. J.; Farmer, J. R.; Ong, M.-S.

2026-05-27 allergy and immunology 10.64898/2026.05.26.26353818 medRxiv
Top 5%
0.0%
Show abstract

Patients with primary immunodeficiency (PID) often face prolonged diagnostic delays and may increasingly turn to large language models (LLMs) to interpret their symptoms during this period. We evaluated whether an LLM could recognize PID from symptom descriptions derived from interviews with 21 PID patients. In a prior study, we showed that GPT-4o identified PID in 96% of cases when prompted with physician-written patient histories (Rider et al., JACI, 2024). Here, when prompted with symptom descriptions in patients' own words, GPT-5 identified PID in only 7 cases (33%), although it more broadly suggested immune system issues in 18 cases (81%). The gap between these findings indicates that LLMs are sensitive to the language and framing of symptom descriptions, performing substantially worse when patients describe their own symptoms in everyday language than when clinicians summarize patient histories in structured medical terms. This study underscores the need to carefully evaluate how LLMs are used in patient-facing applications.

20
Grounding Language Models in Behavioral Science to Scale Physical Activity Interventions for Hispanic/Latinx Populations

Mantena, S. D.; Johnson, A.; Schuetz, N.; Tolas, A.; Montalvo, S.; Delgado-SanMartin, J.; Ramirez Posada, M.; Du, L.; Zhang, S.; Huynh, A. D.; Oppezzo, M.; King, A. C.; Schmiedmayer, P.; Lawrie, A.; Rodriguez, F.; Ashley, E.; Kim, D. S.

2026-05-28 cardiovascular medicine 10.64898/2026.05.26.26354165 medRxiv
Top 5%
0.0%
Show abstract

Objective: Hispanic/Latinx populations in the U.S. experience higher rates of chronic disease linked to physical inactivity, yet digital health interventions remain largely inaccessible to more than 16 million Hispanic/Latinx adults with limited English proficiency. While large language models (LLMs) offer scalable personalization, their use in non-English behavioral coaching is unexplored. This study introduces MHC-Coach-ES, a Spanish-language LLM fine-tuned on the Transtheoretical Model (TTM) of behavior change. Materials and Methods: We fine-tuned Llama 3-70B-Instruct using a two-stage pipeline. First, the model was adapted to Spanish health and motivational language using a 2.21-million-token corpus. Second, it was instruction-tuned on 3,268 translated human written messages to align the model with the Transtheoretical Model (TTM) of Behavioral Change. We compared MHC-Coach-ES with Llama 3-70B-Instruct and translated human-expert messages using a forced-choice preference survey (N = 77) and blinded expert review (N = 2). Results: Spanish-speaking participants significantly preferred MHC-Coach-ES messages over translated human-expert messages (81% preference, P<0.001). Linguistic analysis showed that MHC-Coach-ES produced more temporally anchored messages than the base model (65% vs. 20%), while maintaining readability. In blinded evaluation, clinical experts rated MHC-Coach-ES higher for alignment with Transtheoretical Model stages than human-expert messages (4.83 vs. 4.38 out of 5). The base model also outperformed translated expert messages across preference and expert ratings. Conclusions: Generative AI can operationalize behavioral science frameworks in Spanish, offering a scalable approach to reducing health disparities. The strong performance of both MHC-Coach-ES and the base model highlights the promise of generative and personalized approaches over translation-based localization for theory-driven behavioral interventions.