Healthcare
○ MDPI AG
Preprints posted in the last 7 days, ranked by how well they match Healthcare's content profile, based on 16 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.
Wei, X.; Xao, X.; Hou, J.; Wang, Q.
Show abstract
Background & Aims: Accurate assessment of clinical malnutrition using anthropometric and functional indicators could improve the care of elderly trauma patients in intensive care units (ICUs). This study aimed to develop an AI-driven malnutrition assessment toolbox based on a minimal set of clinically feasible indicators. Methods: Multiple machine learning models, including logistic regression, support vector machines, k-nearest neighbors, decision trees, random forests, XGBoost, and neural-network-based ensemble models, were developed using different indicator configurations from a clinically collected patient dataset. Models were trained using baseline and longitudinal measurements to predict malnutrition risk. SHAP analysis was used to interpret the importance of selected indicators. Results: Baseline (Day 1) data alone did not provide a reliable prediction, whereas longitudinal measurements substantially improved performance. Models based on a minimal indicator set, including bilateral mid-upper arm circumference, calf circumference, and key static variables, outperformed models using the full indicator set. Tree-based methods consistently outperformed linear and distance-based models, with the three-time-point XGBoost achieving the best individual performance. Neural-network-based ensemble models further improved predictive stability. The best overall performance was achieved by the ensemble model using the minimal indicator set from Day 1 and Day 3. SHAP analysis confirmed the importance of the selected indicators. Conclusions: This AI-driven toolbox provides an efficient and clinically feasible approach for early malnutrition assessment in elderly trauma patients in the ICU. Its strong performance with a minimal indicator set supports its potential for integration into clinical workflows and future digital twin systems for intelligent nutritional management.
Murakami, M.; Ohtake, F.
Show abstract
While vaccination conflicts have become apparent, physicians' attitudes toward those with differing views remain unclear. Through an online survey of 492 physicians and 5,252 members of the general public in Japan in February 2026, we investigated attitudes toward four vaccines (influenza, measles, HPV, and COVID-19). Intergroup bias was assessed as ingroup minus outgroup attitudes using a feeling thermometer. Multilevel regression examined associations with agreement group and physician status. Intergroup bias was significantly positive in both agreement and disagreement groups across all vaccine types, and was higher in the agreement group. Physicians exhibited higher intergroup bias than the general public. These findings indicate that vaccination conflict is bidirectional: physicians, often viewed as targets of hostility from vaccine-hesitant individuals, themselves exhibit greater intergroup bias toward those with opposing views. Interventions to raise physicians' awareness of their own bias, alongside communication strategies for vaccine-hesitant individuals, are needed.
Kim, S.; Guo, Y.; Sutari, S.; Chow, E.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.
Show abstract
Social determinants of health (SDoH) are important for clinical care, but it remains unclear how much AI-captured social context is preserved after clinician editing in ambient documentation workflows. We retrospectively analyzed 75,133 paired ambient AI-drafted and clinician-finalized note sections from ambulatory care at a large academic health system. Using a rule-based NLP pipeline, we extracted 21 SDoH categories and quantified retention, deletion, and addition. SDoH appeared in 25.2% of AI drafts versus 17.2% of final notes. At the mention level, AI captured 29,991 SDoH mentions, of which 45.1% were deleted, 54.9% were retained with clinicians adding 3,583 new mentions. Insurance and marital status were most often deleted, whereas substance use and physical activity were more often retained. Deletion patterns also varied by specialty, supporting the need for specialty-aware ambient AI systems.
Di Somma, S.; Gervais, R.; Bains, M.; Carter-Williams, S.; Messner, S.; Onsongo, N.
Show abstract
Background: Chronic conditions such as hypertension can significantly disrupt daily life and emotional wellbeing. The interaction between patients' perceptions, adherence to antihypertensive medication and quality of life (QoL) remains underexplored outside structured clinical settings. Objectives: To capture unprompted patient perspectives and assess whether hypertension affects QoL and to investigate if patient reported experiences are associated with self-reported antihypertensive medication adherence. Methods: Social media listening (SML) study analyzing 86,368 anonymized posts from individuals with hypertension in 12 countries, collected between January 2022 and May 2024. Posts from 11 countries (n=81,368) were analyzed using artificial intelligence-enabled natural language processing. Posts from China (n=5,000) were analyzed separately using a harmonized framework. Quantitative and qualitative methods assessed variations by country, age, and gender, and associations between emotional expression and antihypertensive medication adherence. Results: Across the 11-country core sample, 45% of posts mentioned at least one QoL impact, most commonly worry/anxiety (11%). Impacts varied across countries. Among 8,096 posts with age identified, individuals <40 years reported emotional balance impacts in 28% of posts versus 22% among those aged 40+. Work/Education impacts were mentioned in 17% of posts by those <40 years vs 12% in 40+. Among 7968 posts explicitly referencing adherence, expressed worry was associated with stricter adherence (62% association score), as were structured routines (79% score), home monitoring (77%), dietary changes (77%), and exercise (71%). In contrast, sadness/depression was associated with inconsistent adherence (71%), as were forgetfulness (79%), side effects (73%), and cost/insurance concerns (65%). Conclusions: These results emphasize the importance of the psychological and emotional impact of hypertension, including on adherence to medication regimens, reinforcing the value of a holistic approach to patient care.
xia, y.; Sun, L.; Zhao, Y.
Show abstract
Background: China has implemented policies to strengthen its pharmacist workforce since the 2009 healthcare reform, yet a comprehensive evaluation of their long-term systemic effects is lacking. Objective: To systematically analyze the evolution of Chinas pharmacist workforce in healthcare institutions from 2007 to 2023 across four dimensions: quantity, quality, structure, and distribution, providing an empirical foundation for policy optimization. Methods: A retrospective analysis was conducted using longitudinal data from the China Health Statistics Yearbooks. Trends were delineated via descriptive statistics. Equity and spatial evolution were assessed using the Gini coefficient, Theil index decomposition, and spatial autocorrelation analyses (Global Morans I and hotspot analysis). Results: From 2007 to 2023, the total number of pharmacists increased from 357,700 to 569,500 (average annual growth: 2.2%). This growth lagged behind physicians (4.6%) and nurses (7.4%), causing the pharmacist-to-physician ratio to decline from 1:5.15 to 1:8.39. The workforce showed trends of feminization (female proportion rose from 59.7% to 70.8%) and aging. While quality improved, 51.1% still held an associate degree or below, and only 6.6% held senior titles. Equity analysis revealed the provincial Gini coefficient improved from 0.145 to 0.093. Theil index decomposition confirmed intra-provincial disparities as the primary inequality driver. Spatial analysis showed a non-significant global Morans I by 2023 (0.154, P*>0.05), down from 0.254 (P<0.01) in 2007. Hotspot analysis confirmed this transition, revealing a contraction of high-confidence clusters and a trend toward balanced distribution. Conclusions: China has made measurable progress in expanding pharmacist workforce size and improving inter-provincial equity since 2007. However, persistent structural challenges remain: relative workforce contraction compared to other health professions, an aging demographic, a shortage of senior talent, and significant intra-provincial inequity. Future policies must prioritize optimizing workforce structure and enhancing clinical service capabilities to catalyze a shift toward patient-centered pharmaceutical care.
Basharat, A.; Hamza, O.; Rana, P.; Odonkor, C. A.; Chow, R.
Show abstract
Introduction Large language models are increasingly being used in healthcare. In interventional pain medicine, clinical reasoning is essential for procedural planning. Prior studies show that simplified prompts reduce clinical detail in AI-generated responses. It remains unclear whether this reflects knowledge loss or simply prompt-driven suppression of information. Methods We performed a controlled comparative study using 15 standardized low back pain questions representing common interventional pain questions. Each question was submitted to ChatGPT under three conditions, professional-level prompt (DP), fourth-grade reading-level prompt (D4), and clinician-directed rewriting of the D4 response to a medical level (U4[->]MD). No follow-up prompting was allowed. Three physicians independently rated responses for accuracy using a 0-2 ordinal scale. Clinical completeness was determined by consensus. Word count and Flesch-Kincaid Grade Level (FKGL) were also measured. Paired t-tests compared conditions. Results Accuracy was highest with professional prompting (1.76). Accuracy declined with the fourth-grade prompt (1.33; p = 0.00086). When simplified responses were rewritten for clinicians, accuracy returned to baseline (1.76; p {approx} 1.00 vs DP). Clinical completeness followed the same pattern showing DP 80.0%, D4 6.7%, U4[->]MD 73.3%. Fourth-grade responses were shorter and less complex. Upscaled responses were more complex and similar in length to professional responses. Inter-rater reliability was low (Fleiss {kappa} = 0.17), but trends were consistent across conditions. Conclusions Reduced clinical detail under simplified prompts appears to reflect constrained output rather than loss of knowledge. Clinician-directed reframing restores omitted content. LLM performance in interventional pain depends strongly on prompt design and intended audience.
Gjertsen, M.; Yoon, W.; Afshar, M.; Temte, B.; Leding, B.; Halliday, S.; Bradley, K.; Kim, J.; Mitchell, J.; Sanders, A. K.; Croxford, E. L.; Caskey, J.; Churpek, M. M.; Mayampurath, A.; Gao, Y.; Miller, T.; Kruser, J. M.
Show abstract
Importance: Physicians routinely prognosticate to guide care delivery and shared decision making, particularly when caring for patients with critical illnesses. Yet, these physician estimates are prone to inaccuracy and uncertainty. Artificial intelligence, including large language models (LLMs), show promise in supporting or improving this prognostication. However, the performance of contemporary LLMs in prognosticating for the heterogeneous population of critically ill patients remains poorly understood. Objective: To characterize and compare the performance of LLMs and physicians when predicting 6-month mortality for hospitalized adults who survived critical illness. Design: Embedded mixed methods study with elicitation and comparison of prognostic estimates and reasoning from LLMs and practicing physicians. Setting: The publicly available, deidentified Medical Information Mart for Intensive Care (MIMIC)-IV v2.2 dataset. Participants: We randomly selected 100 hospitalizations of adult survivors of critical illness. Four contemporary LLMs (Open AI GPT-4o, o3- and o4-mini, and DeepSeek-R1) and 7 physicians provided independent prognostic estimates for each case (1,100 total estimates; 400 LLM and 700 physician). Main outcomes and measures: For each case, LLMs and physicians used the hospital discharge summary and demographics to predict 6-month mortality (yes/no) and provide their reasoning (free text). We assessed prognostic performance using accuracy, sensitivity, and specificity, and used inductive, qualitative content analysis to characterize reasonings. Results: Mean physician accuracy for predicting mortality was 70.1% (95% CI 63.7-76.4%), with sensitivity of 59.7% (95% CI 50.6-68.8%) and specificity of 80.6% (95% CI 71.7-88.2%). The top-performing LLM (OpenAI o4-mini) accuracy was 78.0% (95% CI 70.0-86.0%), with sensitivity of 80.0% (95% CI 67.4-90.2%) and specificity of 76.0% (95% CI 63.3-88.0%). The difference between mean physician and top-performing LLM accuracy was not statistically significant (p = 0.5). Qualitative analysis revealed similar patterns in LLM and physician expressed reasoning, except that physicians regularly and explicitly reported uncertainty while LLMs did not. Conclusion and Relevance: In this study, LLMs and physicians achieved comparable, moderate performance in predicting 6-month mortality after critical illness, with similar patterns in expressed reasoning. Our findings suggest LLMs could be used to support prognostication in clinical practice but also raise safety concerns due to the lack of LLM uncertainty expression.
Kwon, C.-Y.; Lee, B.; Kim, M.; Mun, J.-h.; Seo, M.-G.; Yoon, D.
Show abstract
BackgroundHwa-byung (HB) is a Korean culture-bound syndrome characterised by prolonged suppression of anger and somatic complaints. No evidence-based digital therapeutic (DTx) has been developed for HB. We evaluated the feasibility, user experience (UX), and preliminary clinical effect of an acceptance and commitment therapy (ACT)-based DTx application, Hwa-free, for HB. MethodsAdults aged 19-80 years diagnosed with HB were enrolled in a four-week app-based intervention with assessment at baseline (Week 0), Week 2, Week 4, and Week 8 follow-up. The primary outcome was UX assessed via a 22-item survey at Week 4. Secondary outcomes included HB-related symptom and personality scales, depression, anxiety, anger expression, psychological flexibility, health-related quality of life, and heart rate variability. ResultsOf 45 screened, 30 were enrolled and 28 constituted the modified intention-to-treat population. Mean app use was 19.9 {+/-} 7.9 days (71.2% adherence over 28 days). Adverse events were infrequent and unrelated to the intervention. Positive response rates exceeded 80% for video content (items 2-4: 82.8-89.7%), HB self-assessment (86.2%), meditation therapy (86.2%), and in-app guidance (85.7%). Pre-post improvements from baseline to Week 4 were observed in 11 of 18 clinical scales, including HB Symptom Scale ({Delta} = -9.8, Cohens d = -0.92), Beck Depression Inventory-II ({Delta} = -13.3, d = -1.11), and state anger ({Delta} = -7.8, d = -0.96). The HB screening-positive rate declined from 100% at baseline to 55.6% at Week 8. ConclusionsHwa-free demonstrated adequate feasibility, acceptable UX, and preliminary evidence of clinically meaningful improvement in HB-related symptoms. Future randomised controlled trial is warranted. Trial registrationCRIS, KCT0011105
Matthewman, J.; Denaxas, S.; Langan, S.; Painter, J. L.; Bate, A.
Show abstract
Objectives: Large language models (LLMs) have shown promise in creating clinical codelists for research purposes, a time-consuming task requiring expert domain knowledge. Here, we evaluate the performance and assess failure modes of a retrieval augmented generation (RAG) approach to creating clinical codelists for the large and complex medical terminology used by the Clinical Practice Research Datalink (CPRD). Materials & Methods: We set up a RAG system using a database of word embeddings of the medical terminology that we created using a general-purpose word embedding model (gemini-embedding). We developed 7 reference codelists presenting different challenges and tagged required and optional codes. We ran 168 evaluations (7 codelists, 2 different database subsets, 4 models, 3 epochs each). Scoring was based on the omission of required codes, and inclusion of irrelevant codes. We used model-grading (i.e., grading by another LLM with the reference codelists provided as context) to evaluate the output codelists (a score of 0% being all incorrect and 100% being all correct). Results: We saw varying accuracy across models and codelists, with Gemini 3 Pro (Score 43%) generally performing better than Claude Sonnet 4.6 (36%), Gemini 3 Flash, and OpenAI GPT 5.2 performing worst (14%). Models performed better with shorter target codelists (e.g., Eosinophilic esophagitis with four codes, and Hidradenitis suppurativa with 14 codes). For example, all models consistently failed to produce a complete Wrist fracture codelist (with 214 required codes). We further present evaluation summaries, and failure mode evaluations produced by parsing LLM chat logs. Discussion: Besides demonstrating that a single-shot RAG approach is currently not suitable for codelist generation, we demonstrate failure modes including hallucinations, retrieval failures and generation failures where retrieved codes are not used. Conclusions: Our findings suggest that while RAG systems using current frontier LLMs may create correct clinical codelists in some cases, they still struggle with large and complex terminologies and codelists with a large number of codes. The failure mode we highlight can inform the creation of future workflows to avoid failures.
Auger, S. D.; Varley, J.; Hargovan, M.; Scott, G.
Show abstract
Background: Current medical large language model (LLM) evaluations largely rely on small collections of cases, whereas rigorous safety testing requires large-scale, diverse, and complex cases with verifiable ground truth. Multiple Sclerosis (MS) provides an ideal evaluation model, with validated diagnostic criteria and numerous paraclinical tests informing differential diagnosis, investigation, and management. Methods: We generated synthetic MS cases with ground-truth labels for diagnosis, localisation, and management. Four frontier LLMs (Gemini 3 Pro/Flash, GPT 5.2/5 mini) were instructed to analyse cases to provide anatomical localisation, differential diagnoses, investigations, and management plans. An automated evaluator compared these outputs to the ground-truth labels. Blinded subspecialty experts validated 70 cases for realism and automated evaluator accuracy. We then evaluated LLM decision-making across 1,000 cases and scaled to 10,000 to characterise rare, catastrophic failures. Results: Subspecialist expert review confirmed 100% synthetic case realism and 99.8% (95% CI 95.5 to 100) automated evaluation accuracy. Across 1,000 generated MS cases, all LLMs successfully included MS in the differential diagnoses for more than 91% cases. However, diagnostic competence did not associate with treatment safety. Gemini 3 models had low rates of clinically appropriate steroid recommendations (Flash: 7.2% 95% CI 5.6 to 8.8; Pro: 15.8% 95% CI 13.6 to 18.1) compared to GPT 5 mini (23.5% 95% CI 20.8 to 26.1), frequently overlooking contraindications like active infection. OpenAI models inappropriately recommended acute intravenous thrombolysis for MS cases (9.6% GPT 5.2; 6.4% GPT 5 mini) compared to below 1% for Gemini models. Expanded evaluation (to 10,000 cases) probed these errors in detail. Thrombolysis was recommended in 10.1% of cases lacking symptom timing information and paradoxically persisted (2.9%) even when symptoms were explicitly documented as more than 14 days old. Conclusion: Automated expert-level evaluation across 10,000 cases characterised artificial intelligence clinical blind spots hitherto invisible to small-scale testing. Massive-scale simulation and automated interrogation should become standard for uncovering serious failures and implementing safety guardrails before clinical deployment exposes patients to risk.
Loh, K. J.; Lee, W. L.; Ng, A. L. O.; Chung, F. F. L.; Renganathan, E.
Show abstract
BackgroundCaring for people with dementia can impose a considerable psychological burden on caregivers, yet access to caregiver support in Malaysia remains limited. The World Health Organizations iSupport for Dementia program provides dementia education via textual, e-learning format. However, a culturally adapted Malaysian version has not been available. ObjectiveThis study aimed to develop and gather user feedback on a culturally adapted, multimedia version of iSupport tailored for Malaysia (iSupport-Malaysia). MethodsGuided by a four-phase cultural adaptation framework, the generic iSupport content was translated into Bahasa Malaysia, adapted to local customs, and transformed into multimedia lessons on an e-learning platform. A mixed-methods design was used to explore user perceptions and evaluate usability through four homogeneous focus group discussions and 15 individual usability test sessions with informal caregivers (FG: n=9; UT: n=9) and healthcare professionals (FG: n=11; UT: n=6). Focus groups examined aesthetics, ease of use, clarity, cultural relevance, comprehensiveness, and satisfaction. Usability testing involved Think Aloud tasks, post-test questionnaires, and brief interviews. Qualitative data was analysed thematically, and descriptive statistics summarised usability performance. ResultsiSupport-Malaysia demonstrated good usability (M=74.3{+/-}18.0), with most tasks completed without assistance. Strengths included interactive learning activities, peer discussion features, and flexible self-paced learning. Content was viewed as culturally appropriate, credible, and useful. Suggested improvements included enhancing visual aesthetics, shortening videos, refining quizzes, and increasing practical relevance. ConclusionUser insights indicate that iSupport-Malaysia is usable and culturally appropriate. These findings will inform refinement of the platform prior to the pilot feasibility study and provide recommendations for future multimedia-based caregiver interventions.
Obasohan, P. E.; Palmer, J.; Alderson, D.; Yu, D.; Gronne, D. T.; Roos, E. M.; Skou, S. T.; Peat, G. M.
Show abstract
ObjectiveUnlike several other fields of healthcare, little is known about the size of therapist effects on patient outcomes following rehabilitation for musculoskeletal conditions. We aimed to estimate the proportion of variance in patient outcomes from a structured rehabilitation program explained by therapist effects. MethodsFor our observational cohort study we accessed data from the national multicentre Good Life with osteoArthritis in Denmark (GLA:D) osteoarthritis management program. Analyses included 23,021 consecutive eligible adults with hip or knee osteoarthritis (mean (SD) age 65.0 (9.8) years, 71% female) treated by 657 therapists between October 2014 and February 2019. The primary outcome was [≥]30% reduction in pain intensity on 0-100 VAS at 3 months. Therapist effects were estimated as the variance partition coefficient (intra-class correlation coefficient (ICC)) from two-level random intercept logistic regression models before and after adjusting for patient-level case-mix factors and therapist-level characteristics (number of patients treated, days since therapist certification). Analyses were repeated for a range of secondary outcomes using multiply imputed data and complete-case analysis. Results52% of patients reported a [≥]30% reduction in pain intensity on 0-100 VAS at 3 months. In the null model the ICC was 0.007 (95%CI: 0.005, 0.009), which changed little after adjusting for patient- and therapist-level covariates. Upper confidence limits for ICC estimates across all secondary outcomes in multiply imputed and complete case analyses were less than 0.03. ConclusionsIn a nationally implemented osteoarthritis management program delivered by trained healthcare professionals, therapist effects made a minimal contribution to variation in patient outcomes. KEY MESSAGESO_ST_ABSWhat is already known on this topicC_ST_ABS Therapist effects - defined as the effect of a given therapist on patient outcomes as compared to another therapist - have been observed in several fields of healthcare and have important consequences for selection, training, and service improvement. In musculoskeletal rehabilitation five previous studies suggest that 1-12% of variation in patient-reported outcomes may be attributable to therapist effects, but these estimates were based on relatively small datasets resulting in substantial uncertainty. What this study addsOur cohort study analysed registry data from 2014-2019 on 23,021 patients and 647 trained therapists from the nationally implemented GLA:D structured osteoarthritis management program in Denmark. We found that therapist effects accounted for less than 3% of total variation in patient-reported pain and quality of life outcomes 3 months after beginning the program How this study might affect research, practice, or policyOur findings suggest that contextual factors that relate to therapist effects - therapist characteristics or therapist-patient interaction and alliance - make a minimal contribution to variation in patient outcomes from this structured, group-based rehabilitation intervention. Any contextual effects must be attributable to alternative sources, e.g. patient expectations, intervention setting.
Koskei, G.; Karanja, S.; Ndugu, Z. W.; Anino, C. O.
Show abstract
Child undernutrition remains a major public health challenge in Kenya. Suboptimal feeding practices contribute significantly to persistent underweight and stunting. This study evaluated the effect of a community-based Positive Deviance Hearth (PDH) intervention on feeding practices among children aged 6-59 months in Sub County within a County of study. The study adopted a two-group pretest-posttest randomized experimental study design conducted for six months period, among 84 caregiver-child pairs in intervention and control groups. A multi-stage sampling was employed to identify study settings and participants. Structured and pretested questionnaires, 24-hour food recall questionnaires and meal diversity questionnaires were used for data collection at pre-intervention and post-intervention periods. Data was analyzed using R software v.4.5.2. The differences between intervention and control groups at baseline and endline were assessed using difference-in-difference analysis, relevantly summarized using adjusted DID estimates, 95% confidence intervals and p-values, with p<0.05 considered significant. The PDH intervention significantly improved feeding practices among children 6-59 months. Meal frequency increased for 9-23 months (DiD = +1.4; 95% CI: 1.2-1.7; p = 0.034) and 24 months and above (DiD = +1.2; 95% CI: 1.1-1.5; p = 0.017), and dietary diversity rose (DiD = +1.3; 95% CI: 1.1-1.9; p < 0.001). Nutrient-dense food consumption improved, including legumes (DiD = +32.6%; p < 0.001) and animal-source foods (DiD = +35.4%; p < 0.001). Energy and protein intake increased across all age groups (p < 0.05), and micronutrients iron, vitamin A, vitamin C also rose significantly (p < 0.05). The PDH intervention substantially improved caregiver feeding practices, increased dietary diversity, and enhanced macro- and micronutrient intake, demonstrating its effectiveness as a scalable, community-driven strategy for sustainably improving child nutrition in high-burden settings.
Hamid, S.; Muneez, M.; Saleem, S.
Show abstract
ABSTRACT Background Before obtaining professional medical care, many people in peri-urban and rural Pakistan contact herbalists, spiritual healers, and unlicensed caregivers. This study examined the social, economic, and cultural factors influencing the use of informal care by analysing the health-seeking behaviours of individuals in the Faisalabad District. Methods An exploratory mixed-methods study was conducted in Makkuana and the surrounding villages of Faisalabad District, Punjab. The quantitative component involved a cross-sectional survey of 69 adults using a structured questionnaire adapted from the I-CAM-Q. The qualitative component comprised twelve in-depth interviews and two focus group discussions. Descriptive statistics and chi-square analysis were used for quantitative data. Thematic analysis, guided by the Health Belief Model and Andersen's Behavioural Model, was applied to qualitative data. Results The mean age of participants was 40.4 years; 62.3% were female, and 79.7% had monthly household incomes below PKR 60,000. Of the 69 participants, 68 (98.6%) sought care from an informal provider first, most commonly an unqualified practitioner (50.7%), herbal practitioner (29.0%), or homeopath (17.4%). Trust was the leading reason for provider choice (43.5%), followed by proximity (24.6%) and low cost (15.9%). Complications were reported by 21.7% of participants, and 39.1% later required formal care for the same illness. Eight qualitative themes emerged: structural and economic barriers to formal care; proximity and convenience as determinants of informal care; trust, familiarity, and social networks; cultural and religious normalisation of traditional practices; poor doctor-patient communication in formal settings; perceived safety and naturalness of alternative remedies; awareness deficits about provider qualifications; and treatment-related harm and delayed escalation to formal care. Conclusion Informal health care seeking is nearly universal in this community, driven by intersecting economic, structural, cultural, and interpersonal factors. Enhancing primary care affordability, accessibility, and the quality of provider-patient communication together with culturally sensitive health literacy programs, is essential to redirect care seeking toward qualified providers.
Nkosi-Mjadu, B. E.
Show abstract
BackgroundSouth Africas public healthcare system serves most of the population through approximately 3,900 primary healthcare clinics characterised by long waiting times and high volumes of repeat-prescription visits. No published pre-arrival digital triage system operates across all 11 official South African languages while aligning with the South African Triage Scale (SATS). This paper reports the design and preliminary safety validation of BIZUSIZO, a hybrid deterministic-AI WhatsApp triage system. MethodsBIZUSIZO delivers SATS-aligned triage via WhatsApp, combining AI-assisted free-text classification (Claude Haiku 4.5) with a Deterministic Clinical Safety Layer (DCSL) that overrides AI output for 53 clinical discriminator categories (14 RED, 19 ORANGE, 20 YELLOW) coded in all 11 official languages and independent of AI availability. A five-domain risk factor assessment can only upgrade triage level. One hundred and twenty clinical vignettes in patient language (English, isiZulu, isiXhosa, Afrikaans; 30 per language) were scored against a developer-assigned gold standard with independent blinded nurse review. A 121-vignette multilingual DCSL safety consistency check across all 11 languages and a 220-call post-hoc framing sensitivity evaluation (110 paired vignettes) were also conducted. ResultsUnder-triage was 3.3% (4/120; 95% CI: 0.9%-8.3%) with no RED under-triage; exact concordance was 80.0% (96/120) and quadratic weighted kappa 0.891 (95% CI: 0.827-0.932). One two-level under-triage was observed on a non-RED presentation (V072, isiXhosa burns vignette, ORANGEGREEN); one two-level over-triage was observed (V054, isiZulu deep laceration, YELLOWRED). In the framing sensitivity evaluation, AI-only classification achieved 50.9% RED invariance under adversarial framing; full-pipeline classification achieved 95.0% in four validated languages, with the DCSL rescuing 18 of 23 AI drift cases. ConclusionsA hybrid deterministic-AI triage system with DCSL-based emergency detection achieved zero RED under-triage and consistent RED detection across all 11 official languages. The 16.7% over-triage rate falls within published South African SATS ranges (13.1-49%). A single two-level under-triage event was observed on an isiXhosa burns vignette (ORANGEGREEN) and is discussed in Limitations. Findings are preliminary; prospective validation against independent nurse triage is the necessary next step.
Kemal, R. A.; Dhani, R.; Simanjuntak, A. M.; Rafles, A. I.; Triani, H. X.; Rahmi, T. M.; Akbar, V. A.; Firdaus, F.; Pratama, B. F.; Zulharman, Z.
Show abstract
Background: Increasing relevance of genetics and molecular biology in medicine necessitates greater genetic literacy among healthcare workers. To assess the literacy level, a validated genetic literacy questionnaire is needed. Therefore, a standardised Indonesian-language genetic literacy questionnaire is essential. Aims: We aimed to translate and validate three genetic literacy questionnaires (PUGGS, iGLAS, and UNC-GKS) for use among Indonesian medical students. We then evaluated genetic literacy levels using one of the validated questionnaires. Methods: The PUGGS, iGLAS, and UNC-GKS questionnaires were translated into Indonesian and then reviewed by an expert panel for translational accuracy and conceptual appropriateness. Back-translation was performed to confirm validity. Initial Indonesian versions of the questionnaires underwent cognitive pre-testing with 12 undergraduate medical students. After refinements, the questionnaires were validated among 34 first- to third-year medical students. The Indonesian version of UNC-GKS questionnaire was then used to assess genetic literacy of 486 medical students comprising 228 preclinical medical students, 187 clerkships, and 71 residents. Results: The Indonesian versions of PUGGS (Cronbach's = 0.819) and UNC-GKS ( = 0.809) demonstrated good reliability, while iGLAS showed poor reliability ( = 0.315). Among the 486 students tested, 56% demonstrated moderate overall genetic literacy, and only 15.2% demonstrated good overall literacy. Basic genetic concepts were relatively well-understood with 54.3% having good literacy. On the contrary, gene variant's effects on health were poorly understood with only 9.7% having good literacy. Inheritance concepts were moderately understood with 24.9% having good literacy. Conclusion: The Indonesian translations of PUGGS and UNC-GKS are reliable tools for assessing genetic literacy among medical students. Using UNC-GKS, we observed predominantly moderate genetic literacy levels. Curriculum improvement to better integrate genetics education is essential to support its clinical applications.
Perry, A. E.; Zawadzka, M.; Rychlik, J.; Hewitt, C.
Show abstract
Objectives: The primary aim of this study was to assess the feasibility of delivering an adapted problem-solving skills (PSS) intervention by quantifying the recruitment, follow-up and completion rates using a brief problem-solving intervention for people with a mental health diagnosis in two Polish prisons. Design: IAPPS is an open, multi-centred, parallel group feasibility randomised controlled trial (RCT). Setting: Two prisons in Poland. Participants: Men in custody aged 18 years and older, having a mental illness and living within the prison therapeutic unit. Interventions: The intervention consisted of an adapted PSS skills intervention plus care as usual (CAU) or care as usual only. Delivered in groups of up to five people in 1.5-hour sessions over the course of two weeks. Main outcome measures: Primary outcomes - rate of recruitment, follow-up, and feasibility to deliver the intervention. Secondary outcomes included measures of depression, general mental health, and coping strategies. Results: 129 male prisoners were screened, 64 were randomly allocated, with a mean age of 53.5 years (SD 14, range 23-84). 59 (95%) prisoners were of Polish origin. Our recruitment rate was 48%. There was differential follow up with those in the intervention group less likely to complete the post-test battery versus those who received care as usual. Outcome measures were successfully collected at both time points. Conclusions We were able to recruit, retain and deliver the intervention within the prison setting; some logistical challenges limited our assessment of intervention engagement. Our data helps to demonstrate how use of the RCT study design can be implemented and delivered within the complex prison environment. Trial registration number ISRCTN 70138247, protocol registration date May 2021
Zhu, L.; Wang, W.; Liang, Z.; Tan, W.; Chen, B.; Lin, X.; Wu, Z.; Yu, H.; Li, X.; Jiao, J.; He, S.; Dai, G.; Niu, J.; Zhong, Y.; Hua, W.; Chan, N. Y.; Lu, L.; Wing, Y. K.; Ma, X.; Fan, L.
Show abstract
The rapid rise of large language models (LLMs) and foundation models has accelerated efforts to build artificial intelligence (AI) agents for mental health assessment, triage, psychotherapy support and clinical decision assistance. Yet a gap persists between healthcare and AI-focused work: while both communities use the language of "agents," clinical research largely describes monolithic chatbots, whereas AI studies emphasize agentic properties such as autonomous planning, multiagent coordination, tool and database use and integration with multimodal mental health data streams. In this Review, we conduct a systematic analysis of mental health AI agent systems from 2023 to 2025 using a six-dimensional audit framework: (i) system type (base model lineage, interface modality and workflow composition, from rule-based tools to role-aware multi-agent foundation-model systems), (ii) data scope (modalities and provenance, from elicited self-report and chatbot dialogues to electronic health records, biosensing and synthetic corpora), (iii) mental health focus (mapped to ICD-11 diagnostic groupings), (iv) demographics (age strata, geography and sex representation), (v) downstream tasks (screening/triage, clinical decision support, therapeutic interventions, documentation, ethical-legal support and education/simulation) and (vi) evaluation types (automated metrics, language quality benchmarks, safety stress tests, expert review and clinician or patient involvement). Across this corpus, we find that most systems (1) concentrate on depression, anxiety and suicidality, with sparse coverage of severe mental illness, neurocognitive disorders, substance use and complex comorbidity; (2) rely heavily on text-based self-report rather than clinically verified longitudinal data or genuinely multimodal inputs; (3) are implemented as single-agent chatbots powered by general-purpose LLMs rather than role-structured, workflow-integrated pipelines; and (4) are evaluated primarily via offline metrics or vignette-based scenarios, with few prospective, clinician- or patient-in-the-loop studies. At the same time, an emerging class of agentic systems assigns foundation models explicit roles as planners, retrieval agents, safety auditors or supervisors coordinating other models and tools. These multiagent, tool-augmented workflows promise personalization, safety monitoring and greater transparency, but they also introduce new risks around reliability, bias amplification, privacy, regulatory accountability and the blurring of clinical versus non-clinical roles. We conclude by outlining priorities for the next generation of mental health AI agents: clinically grounded, role-aware multi-agent architectures; transparent and privacy-preserving use of clinical and elicited data; demographic and cultural broadening beyond predominantly Western adult samples; and evaluation pipelines that progress from offline benchmarks to longitudinal, real-world studies with routine safety auditing and clear governance of responsibilities between agents and human clinicians.
Goldwater, J. C.; Harris, Y.; Das, S. K.; Fernandez Galvis, M. A.; Maru, D.; Jordan, W. B.; Sacaridiz, C.; Norwood, C.; Kim, S. S.; Neustrom, K.
Show abstract
OBJECTIVE: To evaluate the return on investment (ROI) of a community based Diabetes Self Management Program (DSMP) enhanced with health related social needs (HRSN) screening and referrals, implemented by the New York City (NYC) Department of Health and Mental Hygiene with three community based organizations in highly impacted, under resourced neighborhoods. RESEARCH DESIGN AND METHODS: A retrospective cost benefit analysis from a public sector payer perspective was conducted among 171 adults with type 2 diabetes who completed a six week, peer led DSMP delivered by community health workers (CHWs) in English, Spanish, and Korean during 2018 2019. A time driven, activity based costing model captured direct implementation costs, CHW workforce turnover, and administrative overhead. Monetized benefits included avoided diabetes related complications, reductions in self reported emergency department (ED) visits and hospitalizations, and quality adjusted life year (QALY) gains from improved medication adherence. Univariate sensitivity analyses tested robustness under conservative assumptions. RESULTS: Total program costs were $179,224; monetized benefits totaled $1,824,213, yielding a net benefit of $1,644,989 and an ROI of 918%, approximately $10 returned per $1 invested. Excluding QALY gains, ROI remained 551%. Self reported ED visits declined from 149 to 82 and hospitalizations from 93 to 24 in the six months following intervention. Over 80% of participants reported housing instability; 72% were Medicaid covered and 16% uninsured. Sensitivity analyses confirmed a positive ROI under all conservative scenarios. CONCLUSIONS: A CHW led, community based DSMP integrated with HRSN screening and referrals delivered substantial economic and public health value among adults facing housing instability and structural barriers to care. Findings support inclusion of DSMP as a covered benefit in Medicaid managed care, value based payment arrangements, and housing access initiatives to advance equitable diabetes outcomes.
Souza, F. L.; Cabral Souza, N.; Mendes, J. A. d. A.
Show abstract
IntroductionFamily Constellation Therapy (FCT) has been widely disseminated in clinical, public health, and judicial settings despite persistent concerns regarding its theoretical basis, safety, and the limited availability of rigorous randomised evidence supporting its clinical use. ObjectiveThe aim of this systematic review is to assess the effects of FCT across all clinical conditions, explicitly considering both benefits and harms; and summarise the characteristics of studies and intervention settings used in randomised controlled trials of FCT. MethodsFollowing a prospectively registered protocol (CRD420251136190), we conducted a systematic search of seven databases (PubMed, EMBASE, APA PsycInfo, CENTRAL, BVS, Web of Science, and CINAHL) and grey literature (ICTRP and ProQuest database) without language or date restrictions to identify published and unpublished randomised controlled trials of FCT. Study selection, data extraction, risk of bias (RoB 2), and certainty of evidence (GRADE) were performed in duplicate. Statistical analyses followed a prospectively registered analysis plan with prespecified criteria for data pooling and for handling analytical limitations. ResultsNo reliable evidence was found to support the use of FCT for any condition across both clinical and non-clinical samples. All trials included were judged to be at high risk of bias and all comparisons were rated as very low-certainty evidence. Concerns regarding potential adverse effects were identified, and the available data was insufficient to establish the effectiveness of the intervention, precluding any clinical recommendation. ConclusionClinicians, policymakers, and consumers should reconsider adopting FCT while reliable evidence is not available.