Back

Evidence-Graded Decision Authorization for Safe Clinical AI: A Constrained Reasoning Framework

Lin, C.; Lin, J.-Y.; Lin, Y.-S.

2026-05-22 health informatics
10.64898/2026.05.19.26353565 medRxiv
Show abstract

Clinical AI systems have achieved strong predictive performance; however, prediction accuracy is not sufficient for clinical safety. Retrieval-augmented generation (RAG) improves factual accuracy, and general-purpose LLM guardrails constrain surface-level output safety, but these mechanisms do not govern the inferential gap between available clinical evidence and permissible clinical claims. We propose Evidence-Graded Decision Authorization (EGDA), a framework that separates evidence extraction, sufficiency assessment, and claim-level authorization through domain-specific rules. In a controlled experiment using 60 breast cancer decision-snapshot cases (1,260 system outputs across three arms evaluated by LLM-as-Judge with expert calibration), EGDA reduced the unjustified inference rate to 8.0% (vs. 48.7% for unconstrained LLM and 47.7% for RAG; risk difference vs. unconstrained -40.7%, 95% CI -46.9 to -34.0, p < 0.001), raised the appropriate refusal rate to 95.0% (vs. 56.9% and 56.9%; risk difference vs. unconstrained +38.1%, 95% CI +31.3 to +44.5, p < 0.001), and achieved the highest factual correctness at 96.4% (vs. 69.8% and 74.5%). An unexpected finding was that retrieval-augmented generation without an authorization gate failed to reduce unjustified inference relative to the unconstrained baseline (47.7% vs. 48.7%, p = 0.870) and produced no improvement in appropriate refusal (56.9% vs. 56.9%, p = 1.0), showing that information supply alone is not sufficient for inferential governance. We argue that domain-specific, evidence-graded reasoning governance should serve as a deployment reference standard for safety-critical clinical AI.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
32.9%
2
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
6.8%
3
Scientific Reports
3102 papers in training set
Top 14%
6.8%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.3%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 33%
4.8%
6
PLOS ONE
4510 papers in training set
Top 35%
4.0%
7
Nature Medicine
117 papers in training set
Top 0.9%
3.6%
8
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 2%
2.9%
9
Nature Machine Intelligence
61 papers in training set
Top 1%
2.6%
10
The Lancet Digital Health
25 papers in training set
Top 0.4%
1.7%
11
iScience
1063 papers in training set
Top 16%
1.7%
12
Annals of Internal Medicine
27 papers in training set
Top 0.5%
1.3%
13
Nature Human Behaviour
85 papers in training set
Top 3%
1.3%
14
PLOS Digital Health
91 papers in training set
Top 2%
1.1%
15
Bioinformatics
1061 papers in training set
Top 9%
0.9%
16
eLife
5422 papers in training set
Top 52%
0.9%
17
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.7%
18
Communications Medicine
85 papers in training set
Top 1%
0.7%
19
Patterns
70 papers in training set
Top 3%
0.6%
20
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%
21
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.6%
22
Cell Systems
167 papers in training set
Top 14%
0.6%