Back

Development and Validation of CPX-MATE: An End-to-End Medical Education Platform Integrating Voice-Based Virtual Patient Simulation and Automated Real-time Evaluation

Song, J. W.; Kim, M.; Hong, C.; Kim, Y. S.; Cho, J.; Kim, J. H.; Myung, J.; Choi, A.; Yoon, H.; Lee, S. G. W.; You, S. C.; Park, C.

2026-02-25 medical education
10.64898/2026.02.21.26346803 medRxiv
Show abstract

BackgroundObjective Structured Clinical Examination (OSCE; Clinical Performance Examination [CPX] in South Korea) is a high-stakes assessment of clinical performance, communication, and reasoning during time-limited patient encounters. As AI-enabled virtual standardized patient (VSP) simulation and automated scoring are introduced for OSCE-like training, prospective evidence is needed on how such systems perform and are perceived when embedded in real educational workflows. MethodsWe developed CPX with Medical students Assistant for Training and Evaluation (CPX-MATE), a web-based platform integrating (1) CPX with Virtual Standardized Patient (CPX-VSP), real-time voice dialogue with a VSP using speech-to-speech (STS) models, and (2) CPX with Real-Time Evaluator (CPX-RTE), automated transcription, checklist-based scoring, and feedback from encounter audio using a Speech-to-Text model and a large language model. During an emergency medicine clerkship (Nov 2025-Jan 2026), 60 senior medical students completed two 12-min CPX encounters (VSP with acute pancreatitis; HSP with ureteral stone) with immediate CPX-RTE feedback. For CPX-VSP, students were assigned to either a full-capacity or a resource-limited STS configuration (n=30 each). Dialogue fidelity was evaluated by turn-by-turn analysis of student-VSP exchanges, classifying responses into clinically meaningful error types (tangential, oversharing, role-breaking, off-script). CPX-RTE performance was assessed by agreement (Gwets AC1) with professor real-time and resident video-based ratings using a 45-item checklist. Usability of CPX-VSP and CPX-RTE, with overall system usability scale (SUS), were surveyed, and mean per-session costs for CPX-VSP and CPX-RTE were calculated. ResultsAcross 3,282 dialogue turns, overall error rates were 1.77% versus 9.43% for full-capacity versus resource-limited STS configurations (p<0.001), driven by fewer tangential and oversharing responses; no off-script errors were observed. The mean per-session cost was $0.12 for resource-limited configuration and $0.78 for full-capacity configuration. CPX-RTE showed high agreement with human ratings (AC1=0.916 vs professor; 0.916 vs resident), with slightly different levels of agreement across four sections, and high usability across all domains (mean scores, 4.65-4.92), with a per-session cost of $0.17. CPX-MATE demonstrated good overall usability (median [IQR] of 77.5 [70.0-85.0]). ConclusionsEmbedded within a prospective clinical clerkship, CPX-MATE demonstrated operational fidelity and human-level checklist agreement as an end-to-end, voice-based AI-assisted OSCE platform. This real-world deployment supports its scalable integration as a complementary assessment tool while highlighting the importance of systematic validation and context-aware implementation in medical education.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.