Back

OpenEvidence errs on the safe side in a structured test of triage recommendations

Jia, E.; Omar, M.; Barash, Y.; Brook, O. R.; Ahmed, M.; Kruskal, J. B.; Gorenshtein, A.; Klang, E.

2026-04-24 health informatics
10.64898/2026.04.23.26351526 medRxiv
Show abstract

Ramaswamy et al. recently reported in Nature Medicine that ChatGPT Health, a consumer-facing health AI tool, undertriaged 51.6% of true emergencies. It was also susceptible to social anchoring in a structured stress test of triage recommendations. We applied the same vignette-based benchmark to OpenEvidence, a widely used physician-facing AI platform for clinical decision support. The benchmark included 960 prompts across 21 clinical domains (Supplementary Table S3). OpenEvidence undertriaged 12.5% of emergencies, a four-fold reduction relative to ChatGPT Health. It also showed no anchoring effect. Its errors skewed in a safer direction, including 68.0% overtriage of Home presentations. In 65 of 960 responses (6.8%), it declined to assign a triage level. These refusals occurred only in symptom-only prompts and never in urgent or emergency cases. Performance improved when objective clinical data were provided. Under the same benchmark, a widely used physician-facing system showed a different safety profile from a consumer-facing one. This suggests that who a health AI is built for can shape how it fails.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 2%
25.4%
2
npj Digital Medicine
97 papers in training set
Top 0.7%
7.0%
3
Nature
575 papers in training set
Top 4%
6.7%
4
Nature Biomedical Engineering
42 papers in training set
Top 0.1%
6.3%
5
Nature Medicine
117 papers in training set
Top 0.3%
6.3%
50% of probability mass above
6
Cell
370 papers in training set
Top 4%
4.8%
7
Cell Reports Medicine
140 papers in training set
Top 2%
3.5%
8
Nature Machine Intelligence
61 papers in training set
Top 1%
3.5%
9
Nature Methods
336 papers in training set
Top 3%
3.2%
10
Science Translational Medicine
111 papers in training set
Top 1%
2.7%
11
Science
429 papers in training set
Top 11%
2.7%
12
Med
38 papers in training set
Top 0.2%
1.9%
13
Nature Human Behaviour
85 papers in training set
Top 2%
1.6%
14
Nature Cancer
35 papers in training set
Top 0.9%
1.5%
15
Science Advances
1098 papers in training set
Top 20%
1.5%
16
Nature Neuroscience
216 papers in training set
Top 5%
1.3%
17
Neuron
282 papers in training set
Top 7%
1.2%
18
Scientific Reports
3102 papers in training set
Top 67%
1.2%
19
Patterns
70 papers in training set
Top 2%
1.2%
20
Molecular Systems Biology
142 papers in training set
Top 1%
0.9%
21
Nature Genetics
240 papers in training set
Top 7%
0.9%
22
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
23
Communications Biology
886 papers in training set
Top 25%
0.7%
24
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
25
Nature Microbiology
133 papers in training set
Top 5%
0.6%
26
eLife
5422 papers in training set
Top 62%
0.6%