Back

Patterns

15 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Leveraging Generative Artificial Intelligence for Enhanced Data Augmentation in Emotion Intensity Classification: A Comprehensive Framework for Cross-Dataset Transfer Learning
2026-03-03 health informatics 10.64898/2026.02.23.26346928
#1 (2.1%)
Show abstract

Data scarcity and stylistic heterogeneity pose major challenges for emotion intensity classification. This paper presents a cross-dataset augmentation framework that leverages prompt-conditioned generative models alongside deterministic and heuristic transformations to synthesize target-style examples for improved transfer learning. We introduce a unified taxonomy of augmentation strategies--Heuristic Lexical Perturbation (HLA), Prompt-Conditioned Generative Augmentation (CGA), Sequential Hybrid...

2
Enhancing Prediabetes Diagnosis from Continuous Glucose Monitoring Data via Iterative Label Cleaning and Deep Learning
2026-03-05 health informatics 10.64898/2026.03.04.26347604
#1 (2.0%)
Show abstract

As of early 2026, over 115 million US adults (more than 1 in 3) have prediabetes, a condition with an annual conversion rate of 5%-10% to type 2 diabetes. Total diabetes (diagnosed and undiagnosed) affects approximately 40.1 million Americans, or 12% of the population, with roughly 1.5 million new cases diagnosed annually. Continuous Glucose Monitoring (CGM) provides real-time, 24/7 insights into glycemic variability, detecting dangerous highs, lows, and trends that HbA1c (a 3-month average) mis...

3
Data-Driven Hybrid Model of SARIMA-CNNAR For Tuberculosis Incidence Time Series Analysis in Nepal
2026-02-24 health informatics 10.64898/2026.02.22.26346853
Top 0.2% (1.8%)
Show abstract

BackgroundTuberculosis (TB) remains a major public health challenge in Nepal, with incidence rates substantially higher than global estimates. Accurate forecasting of TB incidence is essential for early warning systems, resource allocation, and targeted interventions. This study aimed to develop and validate a hybrid Seasonal Autoregressive Integrated Moving Average (SARIMA) and Convolutional Neural Network Auto-Regressive (CNNAR) model for TB incidence forecasting in Nepal. MethodsMonthly TB i...

4
Understanding Clinician Edits to Ambient AI Draft Notes: A Feasibility Analysis Using Large Language Models
2026-03-02 health informatics 10.64898/2026.02.27.26347290
Top 0.3% (1.6%)
Show abstract

Ambient AI documentation tools generate draft notes that clinicians can review and edit before signing off in electronic health records. Scalable computational approaches to characterize how clinicians modify drafts remain limited, yet are essential for evaluating and improving AI effectiveness. We examined the feasibility of a few-shot prompted large language model (LLM) for categorizing sentence-level edits between AI drafts and final documentation. We developed five label-specific binary mode...

5
A deterministic safety pipeline for therapeutic AI in elderly assisted living
2026-02-18 health informatics 10.64898/2026.02.17.26346507
Top 0.3% (1.5%)
Show abstract

Over 54 million Americans are aged 65+, with depression affecting 25-49% and anxiety exceeding 30% of assisted living residents. AI systems employing agentic orchestration exhibit 0.5-2% failure rates--unacceptable where a single missed crisis can be fatal. We designed and bench-evaluated Lilo Engine, a 5-layer deterministic therapeutic pipeline replacing a prior multi-agent orchestrator. Safety is enforced through structural invariants: a Guardian layer with 4-gate OR crisis detection runs unco...

6
An LLM-assisted framework for accelerated and verifiable clinical hypothesis testing from electronic health records
2026-02-12 health informatics 10.64898/2026.02.10.26346008
Top 0.3% (1.5%)
Show abstract

Acquiring insights from electronic health records (EHRs) is slowed by manual analytical workflows that limit scalability and reproducibility. We present LATCH (LLM-Assisted Testing of Clinical Hypotheses), an agentic framework that converts natural language clinical hypotheses into fully auditable analyses on structured EHR data. LATCH integrates LLM-assisted semantic layers with deterministic execution pipelines to automate cohort construction, statistical analysis, and result reporting, while ...

7
Sino-US-DrugQA: A Benchmark for Evaluating Large Language Models in Cross-Jurisdictional Pharmaceutical Regulation
2026-02-17 health informatics 10.64898/2026.02.13.26346236
Top 0.4% (1.5%)
Show abstract

Cross-jurisdictional pharmaceutical compliance requires comparative analysis of regulatory requirements across jurisdictions such as the US FDA and Chinas NMPA. Although large language models (LLMs) are increasingly explored for healthcare-related applications, their performance in cross-jurisdictional regulatory comparison has not been systematically characterized using dedicated benchmarks. This study introduces Sino-US-DrugQA, a bilingual benchmark dataset designed to evaluate LLM performance...

8
Patient-Centric Markov-Chain Framework for Predicting Medication Adherence Using De-Identified Data
2026-02-10 health informatics 10.64898/2026.02.08.26345856
Top 0.4% (1.5%)
Show abstract

Long-term adherence to prescribed therapies remains a persistent challenge in chronic and ultra-rare conditions where clinical outcomes depend on continuous medication use. Even brief gaps in therapy can compromise disease control, yet patients frequently encounter structural barriers including high out-of-pocket costs, prior-authorization (PA) delays, annual re-verification cycles, and refill logistics that disrupt persistence. This study evaluates a patient-centric Markov-chain framework for a...

9
Handling onset age inconsistencies in longitudinal healthcare survey data
2026-02-23 health informatics 10.64898/2026.02.20.26346741
Top 0.5% (1.4%)
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWLongitudinal healthcare surveys frequently contain inconsistencies in self-reported onset ages, where participants report different ages for the same condition between enrollment and follow-up surveys. We propose two methods to handle this challenge. First, we introduce a procedure that aggregates inconsistency patterns to construct participant-level reliability scores, enabling researchers to stratify participants and prioritize analysis on high-reliability cohorts. Seco...

10
Personalized Insights Derived from Wearable Device Data and Large Language Models to Improve Well-Being
2026-03-04 health informatics 10.64898/2026.03.03.26347299
Top 0.5% (1.3%)
Show abstract

Health behaviors such as physical activity and sleep affect mental health, but the effect of each health behavior varies substantially across individuals, limiting the usefulness of generic behavioral recommendations. We collected one year of continuous wearable and ecological momentary assessment data from 3,139 participants in the Intern Health Study (2018-2023), and examined individual-level associations between wearable-derived features and mood across the internship year. The behaviors asso...

11
Anatomically and Biochemically Guided Deep Image Prior for Sodium MRI Denoising
2026-03-02 health informatics 10.64898/2026.02.27.26347249
Top 0.8% (1.3%)
Show abstract

Sodium (23Na) magnetic resonance imaging (MRI) provides valuable metabolic information, but it is limited by a low signal-to-noise ratio (SNR) and long acquisition times. To overcome these challenges, we present a Deep Image Prior (DIP)-based framework that combines anatomically guided proton (1H) MRI and metabolically guided 23Na MRI denoising via a fused proton-sodium prior within a directional total variation (dTV) regularization scheme. The DIP-Fusion approach minimizes a variational loss fu...

12
CardioPulmoNet: Modeling Cardiopulmonary Dynamics for Histopathological Diagnosis
2026-02-20 health informatics 10.64898/2026.02.19.26346620
Top 1.0% (1.2%)
Show abstract

ObjectiveThis study investigates whether incorporating physiological coupling concepts into neural network design can support stable and interpretable feature learning for histopathological image classification under limited data conditions. MethodsA physiologically inspired architecture, termed CardioPulmoNet, is introduced to model interacting feature streams analogous to pulmonary ventilation and cardiac perfusion. Local and global tissue features are integrated through bidirectional multi-h...

13
Federated penalized piecewise exponential model for horizontally distributed survival data: FedPPEM
2026-02-12 health informatics 10.64898/2026.02.11.26346054
Top 1.0% (1.2%)
Show abstract

Cox proportional hazard regressions are frequently employed to develop prognostic models for time-to-event data, considering both patient-specific and disease-specific characteristics. In high-dimensional clinical modeling, these biological features can exhibit high collinearity due to inter-feature relationships, potentially causing instability and numerical issues during estimation without regularization. For rare diseases such as acute myeloid leukemia (AML), the sparsity and scarcity of data...

14
Making sleep behaviors interpretable: adapting the two-process model of sleep regulation to longitudinal Fitbit sleep and activity behaviors for health insights
2026-03-03 health informatics 10.64898/2026.03.01.26347356
Top 1.0% (1.2%)
Show abstract

BackgroundAs sleep data from wearable devices are increasingly available in health research, there are new opportunities to understand sleep regulation behaviors as modifiable risk factors for disease. At such a large scale (tens of thousands of people over millions of day-level observations), prioritizing and interpreting sleep behaviors is challenging while maintaining biological relevance and modifiability. In this work, we aim to address this challenge by proposing a framework to interpret F...

15
Thyroid Cancer Risk Prediction from Multimodal Datasets Using Large Language Model
2026-03-06 health informatics 10.64898/2026.03.05.26347766
Top 1% (1.1%)
Show abstract

Thyroid carcinoma is one of the most prevalent endocrine malignancies worldwide, and accurate preoperative differentiation between benign and malignant thyroid nodules remains clinically challenging. Diagnostic methods that medical practitioners use at present depend on their personal judgment to evaluate both imaging results and separate clinical tests, which creates inconsistency that leads to incorrect medical evaluations. The combination of radiological imaging with clinical information syst...

16
MedOS: AI-XR-Cobot World Model for Clinical Perception and Action
2026-02-23 health informatics 10.64898/2026.02.18.26345936
Top 1% (1.0%)
Show abstract

Medicine historically separates abstract clinical reasoning from physical intervention. We bridge this divide with MedOS, a general-purpose embodied world model. Mimicking human cognition via a dual-system architecture, MedOS demonstrates superior reasoning on biomedical benchmarks and autonomously executes complex clinical research. To extend this intelligence physically, the system simulates medical procedures as a physics-aware model to foresee adverse events. Generating and validating on the...

17
Red-Teaming Medical AI: Systematic Adversarial Evaluation of LLM Safety Guardrails in Clinical Contexts
2026-03-05 health informatics 10.64898/2026.02.26.26347212
Top 1% (1.0%)
Show abstract

BackgroundLarge language models (LLMs) are increasingly deployed in medical contexts as patient-facing assistants, providing medication information, symptom triage, and health guidance. Understanding their robustness to adversarial inputs is critical for patient safety, as even a single safety failure can lead to adverse outcomes including severe harm or death. ObjectiveTo systematically evaluate the safety guardrails of state-of-the-art LLMs through adversarial red-teaming specifically designe...

18
Population differences in wearable device wear time: Rescuing data to address biases and advance health equity
2026-03-06 health informatics 10.64898/2026.03.06.26347799
Top 1% (1.0%)
Show abstract

Wearable devices present transformative opportunities for personalized healthcare through continuous monitoring of digital biomarkers; however, individual variations in device wear time could mask or otherwise impact signal identification. Despite the widespread adoption of wearable devices in research, no comprehensive framework exists for understanding how wear time varies across populations or for addressing wear time-related biases in analysis. Using Fitbit data from 11,901 participants in t...

19
Augmenting Electronic Health Records for Adverse Event Detection
2026-02-11 health informatics 10.64898/2026.02.10.26345962
Top 2% (1.0%)
Show abstract

ObjectiveAdverse events (AEs) resulting from medical interventions are significant contributors to patient morbidity, mortality, and healthcare costs. Prediction of these events using electronic health records (EHRs) can facilitate timely clinical interventions. However, effective prediction remains challenging due to severe class imbalance, missing labels, and the complexity of EHR records. Classical machine learning approaches frequently underperform due to insufficient representation of minor...

20
Ai-Driven Diagnosis Of Non-Alcoholic Fatty Liver Disease And Associated Comorbidities
2026-02-18 health informatics 10.64898/2026.02.12.26345169
Top 2% (1.0%)
Show abstract

Non-alcoholic fatty liver disease (NAFLD) is a globally prevalent hepatic condition caused by the buildup of fat in the liver. It is frequently associated with metabolic comorbidities such as hypertension, cardiovascular disease (CVD), and prediabetes. However, early detection remains challenging due to the asymptomatic progression, and existing primary diagnostic methods, such as imaging or liver biopsy, are often expensive and inaccessible in rural areas. This study proposes a two-stage, inter...