Back

Care Plan Generation for Underserved Patients Using Multi-Agent Language Models: Applying Nash Game Theory to Optimize Multiple Objectives

Basu, S.; Baum, A.

2026-02-25 health informatics
10.64898/2026.02.23.26346934 medRxiv
Show abstract

BackgroundClinicians in care management programs are often in low supply relative to patient demand, especially in US Medicaid programs, and must simultaneously address clinical risk, time efficiency, and patients social needs. Many studies have shown that large language models may assist in their tasks for summarizing patient care, such as in generating care plans; yet these studies also show that different objectives given to agents often conflict and produce problems for safety, efficiency and equity. We tested whether and to what degree using game theoretic approaches (a Nash bargaining framework) can produce care plans that advance multiple objectives across multiple language models, applying data from a real-world Medicaid cohort. MethodsWe conducted two studies in a cohort of 5,148 activated Medicaid care management patients (69.9% female; 45.7% Black or African American; mean age 40.9 years) enrolled in Virginia and Washington. A retrospective evaluation applied five deterministic strategies to the full cohort to characterize multi-objective trade-offs. A pre-registered controlled paired experiment (N = 200) assigned each patient one Nash-orchestrated multi-agent plan and one compute-matched sequential self-critique plan, generated by locally hosted open-source models (DeepSeek-R1 8B; Llama 3.1 8B) with no patient data leaving local infrastructure. Pre-specified outcomes were Safety, Efficiency, Equity, and Composite (mean of the three), each scored 0-1. Reporting follows CONSORT 2010 and STROBE. ResultsNash orchestration produced a Composite score of 0.755 (95% CI 0.751-0.760) versus 0.742 (95% CI 0.739-0.746) for the compute-matched baseline; the paired difference was 0.013 (95% CI 0.008-0.019; p = 6.20 x 10-). Safety and Efficiency paired differences were small-to-moderate in effect size (Cohens d = 0.327 and 0.543, respectively) with confidence intervals excluding zero. The Equity paired difference was 0.000 (95% CI -0.015 to 0.014; p = 0.987). ConclusionsRole-specialized Nash-orchestrated multi-agent language models produced measurably better Safety and Efficiency care plan quality than a compute-matched baseline under data-residency constraints. The null Equity result demonstrates that multi-objective role specialization does not automatically address equity--equity requires explicit design attention beyond composite weighting--with direct implications for responsible AI deployment in Medicaid care management. Author SummaryCare management programs for Medicaid patients need to address multiple goals at once: covering clinical risks, prioritizing the most impactful interventions, and recognizing the social barriers that affect whether patients can follow through on care plans. Prior research shows that automation tools powered by a single AI model tend to optimize for one of these goals at a time, sacrificing the others. We tested whether organizing several specialized AI agents -- each focused on a different goal -- and then combining their recommendations through a mathematical framework called Nash bargaining could produce better overall care plans for a real Medicaid population. We found that this multi-agent approach produced care plans that the AI judge rated as meaningfully safer and more efficient than plans generated by a single AI model using the same total amount of computation. However, the multi-agent approach did not produce plans that were more equitable in addressing patients social needs, suggesting that equity requires more direct attention as a design target rather than emerging from multi-objective combination alone. All AI inference was performed on locally hosted computers, with no patient information sent to outside services, reflecting the privacy requirements of real-world Medicaid care management programs.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.