Back

A Retrospective Evaluation of the Microsoft Healthcare Agent Orchestrator for Tumor Board Patient Summaries

Roy, J.; Korleski, J. B.; Augustin, R. C.; Yefet, L.; Jensen, Z. D.; Ehman, E. C.; Zadeh, G.; Conners, A. L.; Tevaarwerk, A. J.; Korfiatis, P.

2026-06-01 health informatics
10.64898/2026.05.22.26353812 medRxiv
Show abstract

Background: Preparing tumor board patient summaries is time intensive. Large-language-model based systems may automate summarization but require real-world evaluation prior to clinical use. We performed an exploratory retrospective evaluation of the Microsoft Healthcare Agent Orchestrator (HAO), deployed in a Mayo Clinic controlled staged environment, to generate tumor board-style patient summaries from retrospective Electronic Health Record (EHR) notes. Methods: HAO generated summaries for breast, hepatobiliary, and neuro-oncology tumor board cases using up to the most recent 1,000 clinical notes. Clinician reviewers evaluated outputs via REDCap surveys across perceived factuality, completeness, clarity/conciseness, temporal cohesion, comparative performance, safety, and clinical utility (0-4 Likert scale). Reviewers were permitted to query the HAO chat interface to address missing details. Automated factuality was assessed using TBFact (bidirectional entailment), reporting precision and recall against available reference summaries. Results: Among 57 survey responses from 5 different physicians, mean scores exceeded 2.8 across domains, with medians of 3 for most axes. In an exploratory comparison, oncology fellows required less time to review HAO-generated summaries than to manually generate patient summaries (mean difference 13.57 minutes per patient, p<0.001), although this difference may be influenced by prior familiarity with the same cases; 96% of survey responses indicated that HAO would save time. TBFact evaluations showed higher recall than precision across domains, consistent with broad capture of reference content alongside additional content that was not present in gold-standard summaries. Attribution was viewed favorably but showed issues with primary-source specificity and link reliability. Conclusions: In a controlled Mayo environment, HAO demonstrated moderate performance and was associated with reduced review time for tumor board preparation. These findings are promising but preliminary and do not establish clinical safety, noninferiority to manual review, or readiness for routine clinical use. Limitations, including verbosity, specialty-specific content gaps, and inconsistent attribution, highlight the need for iterative refinement and further evaluation.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
14.7%
2
Journal of Medical Internet Research
85 papers in training set
Top 0.5%
9.1%
3
npj Digital Medicine
97 papers in training set
Top 0.6%
8.4%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
8.4%
5
BMJ Health & Care Informatics
13 papers in training set
Top 0.1%
6.4%
6
JAMIA Open
37 papers in training set
Top 0.2%
6.3%
50% of probability mass above
7
Frontiers in Digital Health
20 papers in training set
Top 0.2%
4.0%
8
Cancer Medicine
24 papers in training set
Top 0.4%
3.1%
9
JMIR Medical Informatics
17 papers in training set
Top 0.4%
2.7%
10
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1.0%
2.7%
11
PLOS ONE
4510 papers in training set
Top 45%
2.6%
12
Scientific Reports
3102 papers in training set
Top 50%
2.1%
13
Artificial Intelligence in Medicine
15 papers in training set
Top 0.3%
1.7%
14
Journal of General Internal Medicine
20 papers in training set
Top 0.5%
1.7%
15
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.5%
16
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.2%
17
Annals of Internal Medicine
27 papers in training set
Top 0.6%
1.1%
18
Healthcare
16 papers in training set
Top 1%
0.9%
19
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
20
JAMA Network Open
127 papers in training set
Top 4%
0.9%
21
iScience
1063 papers in training set
Top 29%
0.8%
22
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
23
BMJ Open Quality
15 papers in training set
Top 0.8%
0.7%
24
DIGITAL HEALTH
12 papers in training set
Top 0.7%
0.7%
25
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.7%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
27
Inflammatory Bowel Diseases
15 papers in training set
Top 0.3%
0.7%
28
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
29
Journal of Clinical and Translational Science
11 papers in training set
Top 0.5%
0.6%
30
JAMA
17 papers in training set
Top 0.5%
0.6%