Back
Top 0.1%
21.7%
Top 0.6%
14.3%
Top 1%
8.8%
Top 1%
7.6%
Top 2%
6.3%
Top 3%
5.8%
Top 2%
5.8%
Top 68%
3.8%
Top 87%
3.8%
Top 10%
1.9%
Top 4%
1.9%
Top 3%
1.9%
Top 0.3%
1.9%
Top 2%
1.9%
Top 58%
1.3%
Top 3%
1.3%
Top 2%
1.1%
Top 11%
0.7%
Top 1%
0.7%
Top 24%
0.7%
Boards-style benchmarks overestimate prior-chat bias in large language models: a factorial evaluation study
2026-02-14
health informatics
Title + abstract only
View on medRxiv
Show abstract
BackgroundLarge language models (LLMs) are increasingly piloted as chat interfaces for chart review and clinical decision support. Although leading models achieve and even exceed physician-level accuracy on exam-style benchmarks such as MedQA, recent perturbation studies show large drops in accuracy after small changes to prompts, distractor content, or answer format. Prior work has not systematically examined how these vulnerabilities unintentionally manifest in clinically realistic settings, i...
Predicted journal destinations
1
Journal of the American Medical Informatics Association
53 training papers
2
npj Digital Medicine
85 training papers
3
PLOS Digital Health
88 training papers
4
Journal of Medical Internet Research
81 training papers
5
JAMIA Open
35 training papers
6
BMC Medical Informatics and Decision Making
36 training papers
7
Journal of Biomedical Informatics
37 training papers
8
Scientific Reports
701 training papers
9
PLOS ONE
1737 training papers
10
JAMA Network Open
125 training papers
11
International Journal of Medical Informatics
25 training papers
12
JMIR Medical Informatics
16 training papers
13
Frontiers in Digital Health
18 training papers
14
BMC Medical Research Methodology
41 training papers
15
BMJ Open
553 training papers
16
JMIR Formative Research
31 training papers
17
Journal of Clinical Epidemiology
29 training papers
18
Computers in Biology and Medicine
39 training papers
19
Annals of Internal Medicine
27 training papers
20
Cureus
64 training papers