Back

MedPI: Evaluating AI Systems in Medical Patient-facing Interactions

2026-01-01 health informatics Title + abstract only
View on medRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWWe present MO_SCPLOWEDC_SCPLOWPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn question-answer (QA) benchmarks, MO_SCPLOWEDC_SCPLOWPI evaluates the medical dialogue across 105 dimensions comprising the medical process, treatment safety, treatment outcomes and doctor-patient communication across a granular, accreditation-aligned rubric. MO_SCPLOWEDC_SCPLOWPI comprises five layers: (1) PO_SCP...

Predicted journal destinations