HAARF: Healthcare AI Agents Regulatory Framework - A Comprehensive Security Verification Standard for Autonomous AI Systems in Clinical Environments
Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.
Show abstract
As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.