Back

Human-AI Collaboration in Clinical Reasoning: A UK Replication & Interaction Analysis

Healy, J.; Kossoff, J.; Lee, M.; Hasford, C.

2025-08-27 health informatics
10.1101/2025.08.25.25334383
Show abstract

ObjectiveA paper from Goh et al found that a large language model (LLM) working alone outperformed American clinicians assisted by the same LLM in diagnostic reasoning tests [1]. We aimed to replicate this result in a UK setting and explore how interactions with the LLM might explain the observed gaps in performance. Methods and AnalysisThis was a within-subjects study of UK physicians. 22 participants answered structured questions on 4 clinical vignettes. For 2 cases physicians had access to an LLM via a custom-built web-application. Results were analysed using a mixed-effects model accounting for case difficulty and the variability of clinicians at baseline. Qualitative analysis involved coding of participant-LLM interaction logs and evaluating the rates of LLM use per question. ResultsPhysicians with LLM assistance scored significantly lower than the LLM alone (mean difference 21.3 percentage points, p < 0.001). Access to the LLM was associated with improved physician performance compared to using conventional resources (73.7% vs 66.3%, p = 0.001). There was significant heterogeneity in the degree of LLM-assisted improvement (SD 10.4%). Qualitative analysis revealed that only 30% of case questions were directly posed to the LLM, which suggests that under-utilisation of the LLM contributed to the observed performance gap. ConclusionWhile access to an LLM can improve diagnostic accuracy, realising the full potential of human-AI collaboration may require a focus on training clinicians to integrate these tools into their cognitive workflows and on designing systems that make these integrations the default rather than an optional extra.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
based on 53 papers
Top 0.9%
12.7%
2
BMJ Health & Care Informatics
based on 13 papers
Top 0.1%
8.7%
3
PLOS Digital Health
based on 88 papers
Top 2%
7.6%
4
npj Digital Medicine
based on 85 papers
Top 3%
7.6%
5
BMC Medical Informatics and Decision Making
based on 36 papers
Top 2%
6.4%
6
BMJ Open
based on 553 papers
Top 18%
6.4%
7
The Lancet Digital Health
based on 25 papers
Top 0.5%
3.0%
50% of probability mass above
8
JAMIA Open
based on 35 papers
Top 3%
2.8%
9
Frontiers in Digital Health
based on 18 papers
Top 0.8%
2.8%
10
BMC Medical Research Methodology
based on 41 papers
Top 2%
2.4%
11
JMIR Medical Informatics
based on 16 papers
Top 3%
2.3%
12
JAMA Network Open
based on 125 papers
Top 9%
2.3%
13
PLOS ONE
based on 1737 papers
Top 87%
1.8%
14
Journal of Medical Internet Research
based on 81 papers
Top 8%
1.8%
15
Scientific Reports
based on 701 papers
Top 72%
1.6%
16
DIGITAL HEALTH
based on 11 papers
Top 0.8%
1.6%
17
Journal of Clinical Epidemiology
based on 29 papers
Top 2%
1.3%
18
BMJ Open Quality
based on 15 papers
Top 2%
1.3%
19
International Journal of Medical Informatics
based on 25 papers
Top 4%
1.3%
20
Journal of Biomedical Informatics
based on 37 papers
Top 4%
1.3%
21
BMJ
based on 49 papers
Top 4%
1.3%
22
JCO Clinical Cancer Informatics
based on 14 papers
Top 3%
1.2%
23
Genetics in Medicine
based on 57 papers
Top 5%
0.8%
24
Frontiers in Public Health
based on 135 papers
Top 25%
0.8%
25
Healthcare
based on 14 papers
Top 3%
0.8%
26
Journal of General Internal Medicine
based on 19 papers
Top 4%
0.8%
27
Wellcome Open Research
based on 34 papers
Top 3%
0.8%
28
JMIR Formative Research
based on 31 papers
Top 5%
0.8%
29
CMAJ Open
based on 12 papers
Top 1.0%
0.7%
30
Age and Ageing
based on 27 papers
Top 2%
0.7%