Back

MISP-Bench: Decomposing User-Provided False Priors into Answer, Rationale, and Guard Effects

Jeong, I.; Kim, Y.; Park, J.-H.; Lee, H.

2026-05-10 health informatics
10.64898/2026.05.07.26352627 medRxiv
Show abstract

Large language models in clinical and educational settings routinely receive user-provided context containing incorrect prior beliefs. Existing benchmarks measure aggregate susceptibility to such priors but do not disentangle which structural com-ponent (the asserted answer, the supporting rationale, or their combination) drives the damage, nor test whether safety meta-prompts such as "verify the reasoning first" consistently mitigate it. We introduce MISP-Bench, a factorial benchmark of 1,724 audited multiple-choice items (1,430 MedMCQA medical + 294 GSM8K quantitative) evaluated under 13 prompt conditions across 10 open-weight instruction-tuned models (1B-27B) in chain-of-thought and direct modes, with approximately 1.33M audited response records across three runs per condition. Distractors were generated by GPT-5.4 and the model was excluded from the evaluated set to prevent circular evaluation. Targeted and arbitrary distractor subsets yield similar aggregate Misinformation Damage Index (MDI; accuracy drop relative to a distractor-free baseline) at +19.7 vs +20.4 pp but diverge by 39.1 pp in sycophancy rate (78.4% vs 39.3%). The subsets differ in baseline difficulty, so this is a between-subset composition gap rather than a within-item causal effect. The combined answer-plus-rationale attack exhibits sub-additive saturation (+20.3 pp observed vs +24.5 pp expected under independence; 7/10 models sub-additive, 2 additive, 1 super-additive). Verification-style safety guards split models into three groups by sign at =0.05 (4 reversal, 3 recovery, 3 null), while source-independence and explicit-override guards yield positive recovery in 8/10 and 9/10 models. A six-category audit excludes 770 items, including 732 multi-correct items structurally incompatible with single-best-answer evaluation. The audit list is reusable beyond MISP-Bench. The corpus, response records, notebooks, and audit are released on Hugging Face Datasets (https://huggingface.co/datasets/yh0502/risp-bench) under CC-BY-4.0 (with original-source license inheritance for MedMCQA Apache-2.0 and GSM8K MIT content) with Croissant RAI metadata, with companion code at https://github.cor/anon-risp-2026/risp-bench.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 16%
10.7%
2
Scientific Reports
3102 papers in training set
Top 5%
10.4%
3
npj Digital Medicine
97 papers in training set
Top 0.8%
6.5%
4
Nature
575 papers in training set
Top 6%
4.1%
5
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 1%
3.8%
6
PLOS ONE
4510 papers in training set
Top 37%
3.7%
7
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.2%
3.7%
8
Nature Medicine
117 papers in training set
Top 0.8%
3.7%
9
PLOS Digital Health
91 papers in training set
Top 0.8%
3.2%
10
PLOS Computational Biology
1633 papers in training set
Top 13%
2.4%
50% of probability mass above
11
Nature Human Behaviour
85 papers in training set
Top 1%
2.4%
12
Med
38 papers in training set
Top 0.1%
2.1%
13
Nature Machine Intelligence
61 papers in training set
Top 1%
2.1%
14
Bioinformatics
1061 papers in training set
Top 6%
2.1%
15
Nature Computational Science
50 papers in training set
Top 0.6%
1.7%
16
PNAS Nexus
147 papers in training set
Top 0.2%
1.7%
17
iScience
1063 papers in training set
Top 13%
1.7%
18
Nature Methods
336 papers in training set
Top 4%
1.7%
19
Nature Biomedical Engineering
42 papers in training set
Top 0.8%
1.7%
20
Communications Psychology
20 papers in training set
Top 0.1%
1.5%
21
Scientific Data
174 papers in training set
Top 1%
1.4%
22
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.4%
23
GENETICS
189 papers in training set
Top 0.8%
1.4%
24
Science
429 papers in training set
Top 17%
1.1%
25
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
1.0%
26
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
27
Communications Biology
886 papers in training set
Top 20%
0.8%
28
eLife
5422 papers in training set
Top 55%
0.8%
29
Patterns
70 papers in training set
Top 2%
0.8%
30
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%