Human-supervised, large language model-based clinical decision support aligned to national newborn protocols in Kenya: a pragmatic, early-stage evaluation

Kuria, T.; Kamau, G.; Makokha, F.; Omondi, P.; Mbugua, G.; David, K.; Mbugua, S.; Gitaka, J.

2026-03-25 health informatics

10.64898/2026.03.22.26348994 medRxiv

Show abstract

Introduction: Timely, protocol-adherent clinical decisions are crucial for reducing neonatal mortality in low-resource settings. Translating extensive national guidelines into bedside practice remains challenging. Objective: We developed and evaluated AIFYA, a human-supervised, large language model LLM based clinical decision support system CDSS aligned with Kenya's national newborn care protocols. Methods: This prospective mixed methods early stage evaluation guided by the DECIDE-AI framework embedded AIFYA into routine workflows at two public health facilities Level 5 and Level 4 in Bungoma County Kenya from September 2024 to June 2025. Primary outcomes were adoption measured by cumulative neonatal cases managed training reach assessed by credentialed healthcare workers HCWs and guideline and citation concordance evaluated through blinded review of 118 AI generated recommendations by two neonatologists with adjudication by a third. Secondary outcomes included protocol adherence and triage to decision time. Results: A total of 50 HCWs were trained and 550 neonatal cases were managed over 10 months. Among surveyed HCWs n equals 33, 76 percent were female with mean age 32.1 years. Expert review found 75 percent of recommendations were correct and 15 percent partially correct with strong inter rater reliability weighted Cohen's kappa 0.85 and 95 percent CI 0.79 to 0.91. Citation accuracy was 96 percent. In 40 complex dosing scenarios 75 percent of outputs were rated correct. The median triage to decision time was 23 minutes with interquartile range 18 to 31. Implementation was supported by an offline first architecture and a facility based coaching model sustaining engagement despite staff turnover. Conclusion: A human supervised AI CDSS directly and transparently anchored to national clinical guidelines can be successfully implemented in routine low resource neonatal care settings. The system demonstrated high user adoption and strong expert rated concordance. High citation accuracy builds clinical trust ensuring safety and enabling auditable AI. These findings support progression to controlled multi site trials to evaluate clinical effectiveness. Keywords: Neonatal care Clinical decision support system Large language model Artificial intelligence Human supervised Low resource settings Guideline adherence Digital health Kenya

Human-supervised, large language model-based clinical decision support aligned to national newborn protocols in Kenya: a pragmatic, early-stage evaluation

Matching journals