Healthcare — Latest Matching Preprints

1

Efficacy of Mobile Application Delivered Lifestyle Interventions in Managing Gestational Weight Gain: A Systematic Review and Meta-Analysis with Meta-Regression

Uirianto, G. N.; Nababan, S.

2026-06-01 obstetrics and gynecology 10.64898/2026.05.29.26354025 medRxiv

Top 0.1%

7.0%

Show abstract

Introduction: Managing gestational weight gain (GWG) is crucial for the health of mothers and their children. Mobile applications (apps) specifically designed for pregnancy are emerging as modalities to deliver accessible lifestyle intervention at a low-cost. However, current studies are varied in results and suffer from heterogeneity. Thus, we conducted this systematic review and meta-analysis to summarize the efficacy of mobile apps in managing GWG and investigate variables that may contribute to heterogeneity. Methodology: Seven databases were systematically searched up to 9 November, 2024. Only randomized controlled trials (RCTs) were included. Outcomes were excessive GWG and inadequate GWG according to the 2009 Institute of Medicine (IOM) guideline. Quality appraisal was performed using the Cochrane Risk of Bias 2 (RoB 2) tool. Random-effect model meta-analysis was conducted using odds ratio (OR) as the summary measure alongside their 95% confidence intervals (CI). Results and Discussion: Fifteen RCTs were included. Mobile apps led to a significant overall decrease in excessive GWG (OR: 0.71; 95% CI: 0.54 to 0.95; p-value: 0.02; I2: 60%). Subgroup analysis showed that social media apps, self-monitoring functionalities, and overweight/obese patients are associated with a significant reduction in excessive GWG. However, there was significant evidence of small-study bias in the analysis. Moreover, mobile apps also significantly increased inadequate GWG (OR: 1.51; 95% CI: 1.04 to 2.21; I2: 0%). Meta-regression did not reveal any significant finding. Conclusion: In conclusion, mobile app interventions are shown to be effective in preventing excessive GWG, particularly social media apps and those with self-monitoring functionalities. However, the reduction in excessive GWG may only be seen in overweight and obese patients and more studies are needed to ascertain this finding. Lastly, mobile apps are associated with an increased risk of inadequate GWG and strategies to combat inadequate GWG are needed.

2

Development of an Open-Access Action Observation Video Library for Upper Limb Motor Rehabilitation

Madison, M.; Wheaton, L. A.; Rowe, V.

2026-06-10 rehabilitation medicine and physical therapy 10.64898/2026.06.10.26355108 medRxiv

Top 0.1%

6.9%

Show abstract

Background: Occupational therapists can improve stroke survivors hand and arm movement and participation in daily activities through action observation (AO). AO involves watching another persons hand or arm complete a movement or task. While research generally supports the use of AO with stroke survivors, there are limited AO videos are available to occupational therapists which makes applying AO challenging. Objective: The purpose of this work is to develop structured and widely accessible tool to support access to AO for stroke survivors, occupational therapists, and researchers. Methods: To develop an AO video library for stroke rehabilitation, functional and non-functional upper limb task deficits were first identified through clinical observations and clinician interviews to establish a prioritized list of daily activities. In collaboration with media production specialists, healthy adult volunteers were recruited and filmed performing these tasks from both first- and third-person perspectives. The recorded videos were then systematically edited, enhanced with instructional title slides, and distributed via a public YouTube channel for clinical application and a categorized digital repository for research purposes. Results: Initial assessments revealed a complete lack of familiarity, awareness, and utilization of AO resources among local occupational therapists, despite high perceived clinical utility. To address this gap, a final library of 150 tasks was established, resulting in the production of 419 finalized, standardized videos featuring six healthy volunteers. For clinical application, these videos were hosted on a free, public YouTube channel organized into 18 functional playlists, while a parallel set was structured into distinct movement categories for research repository storage. Conclusion: By providing a structured and highly accessible tool, this repository enables clinicians, researchers, and caregivers to readily implement evidence-based action observation interventions in both clinical and home settings.

3

Grounding Language Models in Behavioral Science to Scale Physical Activity Interventions for Hispanic/Latinx Populations

Mantena, S. D.; Johnson, A.; Schuetz, N.; Tolas, A.; Montalvo, S.; Delgado-SanMartin, J.; Ramirez Posada, M.; Du, L.; Zhang, S.; Huynh, A. D.; Oppezzo, M.; King, A. C.; Schmiedmayer, P.; Lawrie, A.; Rodriguez, F.; Ashley, E.; Kim, D. S.

2026-05-28 cardiovascular medicine 10.64898/2026.05.26.26354165 medRxiv

Top 0.1%

6.8%

Show abstract

Objective: Hispanic/Latinx populations in the U.S. experience higher rates of chronic disease linked to physical inactivity, yet digital health interventions remain largely inaccessible to more than 16 million Hispanic/Latinx adults with limited English proficiency. While large language models (LLMs) offer scalable personalization, their use in non-English behavioral coaching is unexplored. This study introduces MHC-Coach-ES, a Spanish-language LLM fine-tuned on the Transtheoretical Model (TTM) of behavior change. Materials and Methods: We fine-tuned Llama 3-70B-Instruct using a two-stage pipeline. First, the model was adapted to Spanish health and motivational language using a 2.21-million-token corpus. Second, it was instruction-tuned on 3,268 translated human written messages to align the model with the Transtheoretical Model (TTM) of Behavioral Change. We compared MHC-Coach-ES with Llama 3-70B-Instruct and translated human-expert messages using a forced-choice preference survey (N = 77) and blinded expert review (N = 2). Results: Spanish-speaking participants significantly preferred MHC-Coach-ES messages over translated human-expert messages (81% preference, P<0.001). Linguistic analysis showed that MHC-Coach-ES produced more temporally anchored messages than the base model (65% vs. 20%), while maintaining readability. In blinded evaluation, clinical experts rated MHC-Coach-ES higher for alignment with Transtheoretical Model stages than human-expert messages (4.83 vs. 4.38 out of 5). The base model also outperformed translated expert messages across preference and expert ratings. Conclusions: Generative AI can operationalize behavioral science frameworks in Spanish, offering a scalable approach to reducing health disparities. The strong performance of both MHC-Coach-ES and the base model highlights the promise of generative and personalized approaches over translation-based localization for theory-driven behavioral interventions.

4

Dissemination of dementia supporters and residents' attitudes and recognition related to dementia in Japan: a municipal-level ecological study

Noguchi, T.; Ide, K.; Fujihara, S.; Kawagome, A.; Saito, M.; Kondo, K.; Ojima, T.

2026-05-20 epidemiology 10.64898/2026.05.17.26353355 medRxiv

Top 0.1%

3.7%

Show abstract

Background: The Dementia Supporter Initiative is a national public education program in Japan that aims to foster positive attitudes and appropriate understanding of dementia to support people with Alzheimer's disease and related dementia in the community. However, its influence on the community as a whole remains unclear. Objective: This study examined the relationship between dementia supporter training and residents' attitudes and recognition related to dementia at the municipal level. Methods: This ecological cross-sectional study linked municipal-level data from the Japan Gerontological Evaluation Study 2022 wave with publicly available information on the number of dementia supporters. Residents' beliefs and attitudes toward dementia and recognition of dementia consultation services were assessed by mail questionnaires and aggregated at municipal level. The proportion of dementia supporters in each municipality was calculated as of September 2022. Results: Data from 69 municipalities were analyzed. The mean proportion of dementia supporters was 13.47% (2.62-44.85). A higher proportion of dementia supporters was positively correlated with community support-seeking for a family member with dementia (r = 0.328) and recognition of dementia consultation services (r = 0.501). Regression analysis adjusted for municipal covariates also showed their positive associations (per 10-percentage-point increase: coef. = 1.44, p = 0.047; coef. = 3.12, p < 0.001, respectively). No associations were observed with residents' positive attitudes and appropriate understandings of dementia. Conclusions: Wider dissemination of dementia supporters may contribute to better recognition of community support resources, but may be insufficient to influence broader public attitudes and understanding of dementia at the community level.

5

Language-Related Disparities in History Documentation in Patients Admitted for Heart Failure

Gottlieb, E. R.; Mullan, I. D.; Celi, L. A. A.

2026-05-22 cardiovascular medicine 10.64898/2026.05.19.26353593 medRxiv

Top 0.2%

3.5%

Show abstract

Introduction Patients hospitalized with heart failure who do not speak English as their primary language face communication barriers, however the impact on documented History of Present Illness (HPI) and Review of Systems (ROS) has not been reported. Methods This retrospective cohort study was based on MIMIC-IV, an anonymized clinical database. Adult patients admitted to general medicine or cardiology services with heart failure (by DRG) were identified. Multivariable linear regression was used to assess for an association between language (English vs. non-English) and word counts for HPI+ROS and HPI word counts. Qualitative differences in texts were also analyzed using Claude Opus 4.6. Results In a cohort of 552 patients, non-English language (N = 81) was associated with a shorter HPI+ROS (coef. -33.387, 95% CI [-62.076, -4.697], p = 0.023) controlling for age (coef. -1.023, 95% CI [-1.817, -0.230], p = 0.012) and Elixhauser score (coef. 10.391, 95% CI [7.078, 13.705], p<0.001). Similar associations were found for HPI alone. Qualitative differences included less discussion of symptoms and timing of onset. Discussion HPI+ROS and HPI were more abbreviated when the primary documented language was not English. This has important implications for equitable care and the development of emerging translation and documentation technologies.

6

Ambient AI Documentation in Mixed-Language Encounters: A Heuristic Evaluation of Spanish-English and Mandarin-English Conversations

Hu, D.; Flores, D.; Flores, L.; Chien, R.; Lam, K.; Chow, E.; Guo, Y.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.

2026-05-22 health informatics 10.64898/2026.05.19.26353603 medRxiv

Top 0.2%

3.5%

Show abstract

Ambient AI documentation systems rely on automatic speech recognition to transcribe patient-provider conversations before generating clinical notes. However, little empirical evidence exists on how these systems perform in mixed-language clinical encounters. We conducted a mixed-method heuristic evaluation of an ambient AI documentation tool using 24 reenacted primary care conversations involving Spanish-English and Mandarin-English code-switching. Quantitative analyses measured mixed error rate (MER) and code-switching detection. Overall MER was low, with a median of 4% and less variation in Spanish-English conversations, and 9% in Mandarin-English conversations, but with outliers reaching 67%. The system generally detected language switches reliably, although deletions occurred frequently in Mandarin-English transcripts at switch points. Qualitative analysis revealed transcription errors related to phonetic similarity, automatic language translation, clinical terminology recognition, and language-specific challenges. These findings highlight considerations for improving ambient AI clinical documentation systems to support multilingual providers in delivering care for linguistically diverse populations.

7

Use of large language models by academic hospitalists: results of a multicenter survey

Bressman, E.; Auerbach, A.; Keniston, A.; Jens, C.; Ranji, S.

2026-05-29 health systems and quality improvement 10.64898/2026.05.27.26353610 medRxiv

Top 0.2%

3.1%

Show abstract

Introduction: The use of artificial intelligence (AI) by clinicians has increased rapidly in recent years, with large language models (LLMs) emerging as tools that can equal clinician diagnostic performance in simulated settings. However, limited data exist regarding physicians use of LLMs in real-world clinical practice. This study aimed to evaluate the frequency of LLM use among practicing hospitalists, identify which LLMs are most commonly utilized, and assess hospitalists' perceptions of the benefits and limitations of LLM use in clinical care. Methods: We conducted a cross-sectional survey study of academic hospital medicine faculty across 8 institutions within the Hospital Medicine Reengineering Network (HOMERuN), a collaborative research consortium. Eligible participants included hospitalists practicing within participating HOMERuN sites during the study period. The survey assessed the frequency of LLM use, types of LLMs used, clinical applications, and physician perceptions regarding usefulness, efficiency, and concerns associated with LLM adoption. Results: 170 respondents (67.1%) reported ever using an LLM in clinical practice. Among LLM users, OpenEvidence was the most used tool (88.9%), followed by ChatGPT (58.5%), Google Gemini (26.9%), and Microsoft Copilot (20.5%). Only a minority of hospitalists reported using LLMs daily while seeing patients. The most common use cases of LLMs were answering diagnostic (77.1%) and management (77.6%) questions. A majority also reported using LLMs to identify or summarize primary literature (60.0%). Lack of trust in outputs (49.8%), uncertainty around institutional policies (48.6%), and lack of access to secure applications (43.1%) were cited as the most frequent barriers to using LLMs in practice. Discussion: The use of LLMs in clinical practice is already widespread, though regular or daily use is not yet typical. Concerns regarding reliability, patient privacy, and safe integration into clinical workflows remain significant barriers to broader adoption. The responsible implementation of LLMs in hospital medicine will require addressing these barriers.

8

Psychosocial outcomes of a multidomain lifestyle and empowerment program for mild cognitive impairment

Vickers, K. L.; De Wit, L.; Goldstein, F. C.; Thelin, J.; Giannotto, E. L.; Saurman, J. L.; Levey, A. I.; Rodriguez, A. D.

2026-05-26 psychiatry and clinical psychology 10.64898/2026.05.21.26353503 medRxiv

Top 0.2%

3.0%

Show abstract

Background: Individuals with mild cognitive impairment (MCI) experience cognitive and functional declines that can negatively impact mood and reduce feelings of self-efficacy. These changes can also lead to elevated distress in care partners (CPs). Therefore, interventions that address quality of life and psychosocial factors in people with MCI and their CPs are needed. Objective: The present study evaluated the impact of a multidomain lifestyle program, the Cognitive Empowerment Program (CEP), on changes in psychosocial functioning, particularly empowerment, in people with MCI and their CPs. Methods: Participants were 94 people with MCI (Mean= 75.1 years old, 45.7% female, 81.9% white) and their CPs (Mean= 69.1 years old, 71.3% female, 87.3% white) that completed the 12-month CEP program comprised of physical, cognitive, and psychosocial interventions. Questionnaires were administered pre- and post-program to assess empowerment, self-efficacy, meaning and purpose, depression, and stress in participants with MCI alongside empowerment, depression, stress, and caregiving burden in CPs. Results: After completing the CEP program, participants with MCI endorsed higher empowerment and self-efficacy as well as fewer symptoms of depression and perceived stress. CPs endorsed feeling more empowered despite elevated caregiver burden. Conclusions: These results suggest multidomain lifestyle programs can positively impact wellbeing in MCI. Future research should focus on refining delivery models, exploring integration with pharmacological treatments, prioritizing inclusion of diverse populations, and measuring long-term outcomes to strengthen the reach and impact of programs like CEP.

9

Machine Learning Estimation of Gestational Age at Delivery Using Linked Mother-Infant Electronic Health Records Across Two Health Systems

Bejan, C. A.; Yang, X.; Pham, A.; Qassem, L.; Abraham, A. A.; Choi, L.; Rosenbloom, S. T.; Gamire, L. X.; Phillips, E. J.

2026-05-25 obstetrics and gynecology 10.64898/2026.05.23.26353959 medRxiv

Top 0.2%

2.9%

Show abstract

Objective This study aimed to train and evaluate supervised machine learning algorithms using electronic health record (EHR) data to accurately estimate gestational age at delivery. <br>Materials and Methods We trained random forest, gradient boosting, and ensemble models on EHR data of mother-infant dyads from Vanderbilt University Medical Center(VUMC) and replicated the analyses at University of Michigan (UMich). We further analyzed EHR predictors of gestational age, assessed temporal drift in EHR data elements, and evaluated model performance stratified by delivery status. <br>Results The study included pregnancies corresponding to 54,344 and 34,345 mother-infant dyads at VUMC (2005-2025) and UMich (2012-2024), respectively. The gestational age predictions of the ensemble models achieved the highest agreement with the reference standard on the VUMC dataset ({+/-}1 week: 85.2%, {+/-}2 weeks: 94.3%, MAE: 4.4 days) and demonstrated stronger generalization on the UMich dataset ({+/-}1 week: 93.1%, {+/-}2 weeks: 97.8%, MAE: 2.8 days). Further, performance was better among pregnancies delivered in more recent years, and among full- and late-term deliveries compared with preterm deliveries. <br>Discussion The results indicate that supervised machine learning methods leveraging linked mother-infant EHRs can accurately estimate gestational age at delivery, while demonstrating the generalizability of the modeling approach and the portability of the analytic workflow across healthcare sites. <br>Conclusion This study presents a robust and generalizable machine learning framework to estimate gestational age at delivery. The framework can be reliably used to impute gestational age in large-scale, real-world clinical studies to support maternal and neonatal health research, in which accurate estimation of pregnancy onset is critical.

10

Adult-Learning Newborn Medicine Curriculum Improves Knowledge in a Low-Resource Neonatal Unit in Sierra Leone

Mvula, M.; Amin, A.; Patil, M. S.; Valentine, G.; Mukarwego, B.; Wagner, S.; Dumbuya, I.; Lou, L.; Sanni, U.; Hansen, A.

2026-06-04 pediatrics 10.64898/2026.06.02.26354766 medRxiv

Top 0.2%

2.8%

Show abstract

Background Sierra Leones neonatal mortality rate is among the highest in the world. Koidu Government Hospital opened a Special Care Baby Unit (SCBU) in 2020. To increase knowledge of the SCBU health care providers (HCPs), a neonatal curriculum was implemented to facilitate HCP education on management of neonatal conditions. The aim of this study was to understand the effect of the curriculum on knowledge acquisition and the perception of the teaching methodologies among participating HCPs. Methods US-based mentors facilitated a two-phase, flipped classroom, virtual neonatal medicine curriculum between October 2024 and April 2025, followed by one-week in-person education sessions with SCBU HCPs. With each phase, participants completed pre- and post-test educational assessments. At the end of the curriculum, they completed a subjective assessment to capture perceptions related to the quality of teaching methodologies integrated within the curriculum. Wilcoxon signed rank test was used to assess pre- versus post-test change. Descriptive statistics were used to analyse the subjective assessment. Results Thirty-eight participants completed the educational assessments, 30 (79%) took all four pre- and post-tests; 25/38 (65.8%) were female, 27 (71.1%) were nurses. Median correct answers for both phases increased from the pre- to post-test for individual learners [Phase 1, pre-test 14/27 (51.9%), post-test 23/27 (85.2%), p<0.001], [Phase 2, pre-test 14/25 (56.0%), post-test 23/25 (92.0%), p <0.001]. Thirty-one participants completed the subjective assessment, of whom 96.8% (30/31) rated the curriculum to be "very effective." All 31 participants indicated that the in-person instruction was "very helpful." Through open text responses, they offered valuable insight into challenges, strengths, and next steps. Conclusion This neonatal curriculum resulted in significantly increased knowledge and was well regarded. Adapting this curriculum or similar curricula show promise to improve the quality of care for small and/or sick neonates in low resource settings.

11

Healthcare professionals' perspectives on a multilevel cardiovascular risk management intervention (PROSPERA programme)

Bongaerts, V. A. M. C.; van Gestel, L. C.; van Peet, P. G.; Vuijk, M.-L. S.; Hageman, S. H. J.; Dorresteijn, J. A. N.; Bonten, T. N.; Numans, M. E.; van Os, H. J. A.; Vos, R. C.

2026-06-09 cardiovascular medicine 10.64898/2026.06.08.26355169 medRxiv

Top 0.2%

2.7%

Show abstract

Background: Two-thirds of Dutch cardiovascular risk management (CVRM) for patients at risk of cardiovascular disease is delivered in primary care practices. While individual risk scores are increasingly used during consultation, a population-level structure for risk-based patient outreach is not currently available. We therefore developed the PROSPERA programme, a multilevel intervention comprising population-level risk stratification and individual-level support tools. Aim: To assess anticipated and experienced barriers and facilitators among healthcare professionals (HCPs) to inform implementation in primary care. Methods: We conducted four focus groups and six interviews with nine primary care HCPs to explore anticipated and experienced barriers and facilitators. Inductive codes were thematically analysed and assigned to corresponding domains of the Theoretical Domains Framework (TDF) and the related Capability, Opportunity, Motivation model of Behaviour. Results: Barriers and facilitators were identified in 11 TDF domains. Population-level barriers included altered professional roles and limitations in technological infrastructure. Individual-level barriers were limited skills in interpreting risk calculations and difficulty integrating tools into clinical routine. Facilitators were related to beliefs on the importance of providing proactive care (population level), the use of U-Prevent for risk communication (individual level) and positive patient responses to the Lifestylecheck questionnaire (individual level). Conclusion: Addressing barriers and facilitators identified at both the population and individual levels can support implementation of the PROSPERA programme. Opportunities exist in education and training of HCPs in risk communication, as well as support in restructuring the physical and digital environment.

12

Gaze information enhances remote skill transfer of piano performance

Oku, T.; Makimoto, Y.; Shioki, M.; Koike, H.; Furuya, S.

2026-05-29 neuroscience 10.64898/2026.05.27.728118 medRxiv

Top 0.2%

2.6%

Show abstract

Remote instruction is increasingly used to teach complex sensorimotor skills, yet conventional audio-video communication poorly conveys the fine-grained attentional cues that support expert guidance. This study tested whether real-time bidirectional gaze sharing enhances remote transfer of piano performance skill by restoring joint visual attention between teacher and learner. Twenty-seven conservatory-level pianists were randomly assigned either to a group, in which teacher and learner gaze positions were visualized during online instruction, or to a group receiving otherwise identical instruction without gaze cues. We recorded eye movements with wearable eye trackers and evaluated piano performance using a high-resolution key-motion sensing system. Real-time gaze sharing increased learners gaze-pattern similarity to a teacher, which was not evident in the control group. A parallel effect was observed for head-movement similarity. Critically, gaze sharing also reduced variability of the key-descending velocity at the moment of finger-key contact for the right-hand landing after a leap, a feature associated with unstable key-striking velocity. These findings exhibit that gaze information is not merely an auxiliary communication cue but a timing-critical coordination channel for remote motor instruction. By augmenting video-mediated pedagogy with shared attentional dynamics, the proposed system offers a framework for transmitting tacit, high-dexterity skills across distance.

13

Prototyping a Generative AI-powered Person-centered Digital Health Tool to Mitigate Risk of Preventable Adverse Drug Events

Dobbins, D.; Russell, A.; Gunther, M.; Shetty, V.; Shomali, A.; Vawdrey, D.; Waring, S.; Whary, P.; Wong, J.; Wright, E. A.; Olson, A. W.

2026-06-04 health systems and quality improvement 10.64898/2026.06.02.26354712 medRxiv

Top 0.3%

2.6%

Show abstract

Objectives: Older adults with comorbidities and polypharmacy have disproportionately high risk of hospitalization as well as readmission from adverse drug events (ADEs), of which 28%-71% are preventable (pADEs). This paper introduces an LLM application, CommunicADE, designed to support risk-mitigation of pADE-related readmission for the aforementioned population. We aim to evaluate CommunicADE's technical performance with OpenAI's HealthBench criteria: accuracy, completeness, communication quality, context awareness, and instruction following. Materials and Methods: Our technical validation study used an LLM (KimiK2.5) to simulate interviews between CommunicADE and nine high-fidelity synthetic patients hospitalized and at increased risk for pADE-related readmission (65+ years, comorbidities, 5+ medications). Some pADE risk mechanisms clues were visible to CommunicADE in patient H&Ps, but most mechanisms were solely discoverable in interviews. Two pharmacists evaluated CommunicADE's interview questions and EHR notes with HealthBench-informed variables. Analyzes used descriptive statistics. Results: For 35 mechanisms across 9 patients (avg=3.89 mechanisms/patient), CommunicADE's precision and recall were 0.92 and 0.63, respectively. Hallucinations were absent. Coherence and person-centeredness scored 4.28 and 4.44 on a 5-point scale (5=highest). On average, communication was at a 5th grade level and objective for 78% of patients. Most patient-reported quotes included in notes (92%) supported detected mechanisms. CommunicADE followed all instructions regarding interview length and patient approvals. Discussion: CommunicADE's strongest performance was in accuracy (precision, hallucinations), communication quality (coherence, readability), context awareness (person-centeredness). Completeness (recall) and instruction following (objectivity, pADE mechanism/quote alignment) show room for improvement. Conclusion: Findings suggest technical readiness for a feasibility pilot with real-world patients, and key areas for performance improvement.

14

When Algorithms Prescribe: A Cross-Sectional Study of Quality, Misinformation, and Engagement in Statin-Related Content on TikTok

Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.

2026-06-08 health informatics 10.64898/2026.06.04.26354962 medRxiv

Top 0.3%

2.5%

Show abstract

Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.

15

Labour Induction in low-risk women at 39 weeks of gestation: a Randomised trial in China (LIRIC) - Protocol of an open label, randomised controlled trial

Gao, H.; Shen, J.; Chen, D.; Mol, B. W.; Hun, W.; Liang, Z.; Bai, X.; Han, X.; Zhu, J.; Wang, H.; Liu, X.; Su, C.; Weng, R.; Liu, Y.; Li, W.; Zhang, D.

2026-05-26 obstetrics and gynecology 10.64898/2026.05.24.26354001 medRxiv

Top 0.3%

2.4%

Show abstract

Abstract Introduction The ARRIVE trial first demonstrated that elective induction of labour (IOL) at 39 weeks in low-risk pregnancies reduced the likelihood of caesarean section (CS) without compromising perinatal safety; however, the generalizability of these findings remains debated, leading to uncertainty in clinical practice. The LIRIC trial aims to evaluate whether 39-week elective IOL reduces CS rates compared with expectant management, while exploring its impact on infant neurodevelopment and multi-omics profiles. Methods and analysis This is a single-centre, open-label, randomized controlled trial in China. A total of 1,074 low-risk pregnant women (nulliparous or multiparous) will be randomly assigned (1:1 ratio) to either 39-week IOL or expectant management. The primary outcome is the caesarean section (CS) rate. Secondary outcomes include a composite of severe neonatal morbidity and perinatal mortality and infant neurodevelopmental scores (Bayley-4 and ASQ-3), among others. Data analysis will follow the Intention-to-Treat (ITT) principle. Biospecimen will be collected for metagenomic and metabolomic analyses, with results to be reported separately. Ethics and dissemination The protocol has been approved by the Ethics Committee of Women's Hospital, School of Medicine, Zhejiang University. Informed consent will be obtained from all participants. Results will be disseminated via peer-reviewed journals, and standardized infant developmental reports will be provided to participants to enhance study benefit. Trial registration number NCT07082530.

16

Reception Of Respectful Maternity Care And Their Determinants Among Postpartum Mothers During Institutional Childbirth In East Wollega Zone Hospitals, West Oromia, Ethiopia, 2026.

Ahmed, T. H.; Abeya, S. G.; Chaka, E. E.

2026-05-21 obstetrics and gynecology 10.64898/2026.05.18.26353527 medRxiv

Top 0.3%

2.2%

Show abstract

Respectful maternity care [RMC] comprises the primary components of high-quality maternal health services. Evidence on RMC levels and determinants in Ethiopia is still inadequate. This study aimed to examine the reception and its determinants among postnatal women in government hospitals in the East Wallaga Zone, West Oromia. An institution-based cross-sectional study was conducted from June to October 2025, within seven days post-delivery. A structured questionnaire based on the WHO RMC tools was used. The total RMC score proved robust reliability [Cronbachs = 0.808] and was organized using the 75th-percentile threshold. Factor analysis revealed basic RMC dimensions, while logistic regression was used to identify predictors of a promising RMC experience. This study presented that only 46.8% of postpartum mothers received adequate RMC, with significant gaps in care. The main deficiencies comprised poor provider self-introduction, failure to call women by name, and infrequent communication and consent practices. Three key RMC dimensions were identified: privacy and consent, explanation and permission, and respectful communication. Using multivariate analysis, interpersonal caring practices were robust predictors of positive RMC experiences. Explaining procedures with possible events, maintaining privacy, obtaining consent, prompt responsiveness, provider self-introduction, and calling mothers by name were significantly associated factors. Sociodemographic and maternal reproductive factors were not significantly associated after adjusting for confounders. Finally, fewer than half [46.6%] of mothers experienced adequate RMC, which indicated major gaps in woman-centered care. Improving respectful interpersonal communication, informed consent, and maintaining privacy should be prioritized to boost the quality of maternal healthcare in the study area.

17

Design and Usability Evaluation of a Digital Guideline Management Application for a Pediatric Cardiac Center

Heidenreich, B. M.

2026-05-26 health informatics 10.64898/2026.05.24.26353982 medRxiv

Top 0.4%

2.0%

Show abstract

Background. Complex cases in specialized pediatric care require consistent adherence to evidence-based clinical pathways and protocols to ensure safe, high-quality, and equitable care. Currently, clinical pathways and supporting documentation are frequently distributed across multiple platforms, leading to fragmentation. Human-centered design principles can guide the development of healthcare technologies that minimize cognitive load and support rapid, efficient access to relevant information in clinical settings. The purpose of this study is to design and evaluate perceived usability of a pediatric cardiac center digital guideline management system that is embedded within the electronic health record leveraging human-centered design. Methods. This study used a mixed-methods usability evaluation to assess a digital guideline management system prototype embedded into clinical workflow. Through human-centered design principles, the prototype provides a centralized digital document library that organizes cardiac-specific clinical pathways, guidelines, procedures, and related resources. A small but diverse sample, encompassing a wide variety of roles and clinical areas within the pediatric cardiac center, was recruited to evaluate the perceived usability of the prototype. Usability was evaluated by stakeholders using the validated System Usability Scale (SUS) with additional optional questions to understand perceptions of the information architecture and clinical value. Results. Preliminary usability testing showed a mean SUS composite score of 76.5, indicating above average usability. Questions related to the complexity of the system and user confidence received high scores across participants. Lower scores were observed for questions related to usage frequency and ability to learn the system very quickly. Conclusion. Leveraging human-centered design when building a digital guideline management system embedded within clinical workflow revealed positive perception from participants. By centralizing access to clinical resources, this prototype can reduce current-state fragmentation. Further evaluation of larger samples is needed to develop a list of future recommendations.

18

Accuracy and Consistency of Frontier LLMs on Orthodontic Diagnostic Tasks: A Repeated-Trial Comparison

Kang, W. J.; Sim, J.; Loh, E. E. M.; Lim, A. C. Y.; FOONG, K. W. C.

2026-05-20 health informatics 10.64898/2026.05.17.26353409 medRxiv

Top 0.4%

2.0%

Show abstract

Importance. Large language models are increasingly explored as clinical decision support tools in orthodontics, yet existing evaluations have been confined to knowledge based question answering where reported accuracy ranges from 18% to 100%. No study has evaluated performance on the computational and classificatory tasks that define daily diagnostic work. Furthermore, 84.3% of published healthcare large language model studies fail to report the number of repeated queries performed, leaving output stochasticity unexamined. Objective. To compare the diagnostic accuracy and output consistency of three frontier reasoning-enhanced large language models, namely, ChatGPT 5.4 (Thinking), Gemini 3 (Thinking), and Claude Opus 4.6 (Extended Thinking), on Bolton analysis, Index of Orthodontic Treatment Need-Dental Health Component (IOTN DHC) classification, space analysis, and lateral cephalometric interpretation. Methods. In this comparative cross-sectional study with a repeated-measures design, each model, accessed through its respective consumer facing web interfaces under default provider settings rather than through application programming interfaces, processed 200 purpose-built items (50 per task) across four independent trials, yielding 2,400 observations. Responses were scored against a pre-established reference standard by two independent raters using strict binary exact match criteria. Accuracy was reported with exact binomial 95% confidence intervals. Inter-model comparisons used Cochran's Q test with post-hoc McNemar's tests and Bonferroni correction. A supplementary context-rich prompting evaluation was conducted on 40 items (480 observations). Results. Claude Opus 4.6 (Extended Thinking) achieved the highest accuracy (99.0%; 95% CI: 96.4 to 99.9%), followed by Gemini 3 (Thinking) (95.5%; 91.6 to 98.1%) and ChatGPT 5.4 (Thinking) (94.0%; 89.8 to 96.9%) (Cochran's Q=6.87, p=0.032). Each model exhibited distinct, non-overlapping error profiles concentrated at the normal-abnormal classification boundary. An accuracy-consistency paradox emerged: the most accurate model was the least consistent (93.0%), while the least accurate was the second-most consistent (98.0%). Context rich prompting eliminated all errors across all three models. Interpretation. Frontier reasoning large language models achieved high overall accuracy on orthodontic diagnostic tasks but retained concealed, task-specific vulnerabilities detectable only through repeated-trial evaluation. An accuracy-consistency paradox, in which the most accurate model was the least consistent, demonstrates that single-trial evaluations cannot characterise clinical risk. The reasoning modes were associated with high arithmetic accuracy but did not compensate for imprecise parametric knowledge on classification tasks; however, the absence of a non-thinking baseline means this association cannot be attributed to the thinking mode itself. Context-rich prompting eliminated all errors on synthetic data but should be regarded as a necessary yet insufficient prerequisite for clinical deployment pending prospective validation on real patient data.

19

Effects of interdisciplinary early developmental intervention programs on behavior, executive functioning and participation in children born preterm: A systematic review with meta-analysis

Schirle, L.; Babel, M.; Briem, J.-S. J.; Gawehn, N.; Janka, H.; Metzendorf, M.-I.; Trunk, E.; Wohlleben, J.; Weibel, S.; Spiegler, J.

2026-06-03 pediatrics 10.64898/2026.06.02.26354617 medRxiv

Top 0.4%

1.9%

Show abstract

Aim: To systematically evaluate evidence on the effects of post-discharge early developmental intervention programs (EI) on behavioral development, quality of life, participation, executive functioning, parent-child interaction, and use of medical services from infancy through adolescence in children born preterm. Method: Four bibliographic databases and one trial registry were systematically searched for randomized controlled trials up to April 23, 2024. Two reviewers independently screened studies and extracted data. In clinically and methodologically comparable studies, random-effects meta-analysis were performed. Risk of bias was assessed with the Cochrane RoB 2 tool, and certainty of evidence with the GRADE approach. Results: Twenty-six studies met inclusion criteria, eleven studies including 2,315 preterm born infants reported relevant outcomes, and seven contributed to meta-analyses. Most reported results showed some concerns or high risk of bias; certainty of evidence ranged from very low to moderate across outcomes. EI may offer small benefits for selective attention, behavioral problems and parent-child interaction. Little to no effect was found for special educational needs, language skills, executive functioning and the use of medical services. No included studies evaluated the effect of EI on ADHD, quality of life, or participation related to mobility or leisure activities. Interpretation: EI may improve problems typically seen in preterm children and should be offered especially to those with additional medical or social risk factors. High-quality, contemporary trials are needed to establish reliable clinical recommendations regarding EI strategies and complementary interventions throughout childhood.

20

Large Language Models in Healthcare Simulation Education: A Bibliometric Analysis with AI-Assisted Screening

Pears, M.; Wadhwa, K.; Payne, S. R.; Konstantinidis, S. T. H.; Biyani, C. S.

2026-06-04 urology 10.64898/2026.06.02.26354722 medRxiv

Top 0.4%

1.9%

Show abstract

Large language models (LLMs) such as ChatGPT are rapidly reshaping healthcare education and simulation-based training in non-technical skills (NTS), yet no bibliometric analysis has mapped this landscape. We searched seven open-access databases (OpenAlex, PubMed, Europe PMC, Crossref, Semantic Scholar, CORE, DOAJ) for English-language publications from January 2020 to March 2026. From 100,277 initial records, a sequential keyword funnel yielded 830 candidate papers, which were screened by 83 independent Claude Sonnet 4.6 AI agents applying pre-specified inclusion criteria (PRISMA-trAIce compliant; Cohen's kappa = 0.86 pre-reconciliation, 1.0 post-reconciliation). The final AI-verified corpus comprised 551 papers with a compound annual growth rate of 109%, contributions from 2,398 authors across 279 journals in 58 countries, and an h-index of 41. ChatGPT dominated the model landscape (46% of papers), with open-source models virtually absent. Virtual patient chatbots were the leading simulation modality (106 papers). Among NTS domains, communication (145 papers) and decision-making (135 papers) were most studied, whereas teamwork, leadership, situational awareness, and crisis resource management were markedly underrepresented. Only 6 urology-relevant papers were identified, none examining LLM integration within boot camp training formats. The field is growing at extraordinary pace but remains concentrated in a narrow range of NTS domains and a single proprietary model. Critical gaps persist in team-based skills training, open-source model evaluation, and specialty-specific simulation. AI-assisted bibliometric screening using multiple independent agents is feasible, reliable, and scalable, offering a replicable methodology for mapping fast-evolving research fields.