Med-ICE: Enhancing Factual Accuracy in Medical AI through Autonomous Multi-Agent Consensus

Chen, Z.; Wu, R.; Liu, Y.; Li, R.; Duprey, A.

2026-04-04 health informatics

10.64898/2026.04.02.26350080 medRxiv

Show abstract

The integration of Large Language Models into high-stakes clinical workflows is critically hampered by their lack of verifiable reliability and tendency to generate hallucinations. This paper introduces Med-ICE, an autonomous framework designed to enhance the reliability of LLMs for medical applications. Med-ICE adapts the Iterative Consensus Ensemble paradigm, enabling a group of peer LLM agents to collaboratively converge on a final answer through iterative rounds of generation and peer review, thereby eliminating the need for an external arbiter and its associated scalability bottleneck. Our work makes three key contributions: (1) a novel semantic consensus mechanism that determines agreement based on semantic similarity, crucial for nuanced clinical language; (2) demonstration of state-of-the-art performance, where Med-ICE significantly outperforms both direct single-LLM generation and the Self-Refinement technique on challenging medical benchmarks; and (3) a highly efficient and scalable architecture, as our Semantic Consensus Monitor is computationally lightweight. This research establishes a new standard for developing safer, more trustworthy LLM systems, paving the way for their responsible integration into medicine.

Med-ICE: Enhancing Factual Accuracy in Medical AI through Autonomous Multi-Agent Consensus

Matching journals