Back

Human-AI Collaboration in Clinical Reasoning: A UK Replication & Interaction Analysis

Healy, J.; Kossoff, J.; Lee, M.; Hasford, C.

2025-08-27 health informatics
10.1101/2025.08.25.25334383 medRxiv
Show abstract

ObjectiveA paper from Goh et al found that a large language model (LLM) working alone outperformed American clinicians assisted by the same LLM in diagnostic reasoning tests [1]. We aimed to replicate this result in a UK setting and explore how interactions with the LLM might explain the observed gaps in performance. Methods and AnalysisThis was a within-subjects study of UK physicians. 22 participants answered structured questions on 4 clinical vignettes. For 2 cases physicians had access to an LLM via a custom-built web-application. Results were analysed using a mixed-effects model accounting for case difficulty and the variability of clinicians at baseline. Qualitative analysis involved coding of participant-LLM interaction logs and evaluating the rates of LLM use per question. ResultsPhysicians with LLM assistance scored significantly lower than the LLM alone (mean difference 21.3 percentage points, p < 0.001). Access to the LLM was associated with improved physician performance compared to using conventional resources (73.7% vs 66.3%, p = 0.001). There was significant heterogeneity in the degree of LLM-assisted improvement (SD 10.4%). Qualitative analysis revealed that only 30% of case questions were directly posed to the LLM, which suggests that under-utilisation of the LLM contributed to the observed performance gap. ConclusionWhile access to an LLM can improve diagnostic accuracy, realising the full potential of human-AI collaboration may require a focus on training clinicians to integrate these tools into their cognitive workflows and on designing systems that make these integrations the default rather than an optional extra.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
BMJ Health & Care Informatics
13 papers in training set
Top 0.1%
26.2%
2
PLOS Digital Health
91 papers in training set
Top 0.2%
8.5%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
6.9%
4
BMJ Open
554 papers in training set
Top 5%
4.2%
5
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
6
Scientific Reports
3102 papers in training set
Top 35%
3.6%
50% of probability mass above
7
PLOS ONE
4510 papers in training set
Top 39%
3.6%
8
Healthcare
16 papers in training set
Top 0.1%
3.6%
9
Frontiers in Digital Health
20 papers in training set
Top 0.2%
3.6%
10
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 2%
2.6%
11
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
12
Emergency Medicine Journal
20 papers in training set
Top 0.3%
1.5%
13
Journal of NeuroEngineering and Rehabilitation
28 papers in training set
Top 0.6%
1.3%
14
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.3%
15
BMC Medical Research Methodology
43 papers in training set
Top 0.8%
1.2%
16
JMIR Medical Informatics
17 papers in training set
Top 1%
1.0%
17
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.0%
18
JAMIA Open
37 papers in training set
Top 1%
1.0%
19
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
0.9%
20
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
21
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.7%
0.8%
22
BMJ Open Quality
15 papers in training set
Top 0.8%
0.8%
23
Frontiers in Public Health
140 papers in training set
Top 8%
0.8%
24
Cancer Medicine
24 papers in training set
Top 1%
0.8%
25
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.7%
0.8%
26
Age and Ageing
27 papers in training set
Top 0.5%
0.7%
27
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%
28
JAMA Network Open
127 papers in training set
Top 5%
0.7%
29
Wellcome Open Research
57 papers in training set
Top 3%
0.7%
30
DIGITAL HEALTH
12 papers in training set
Top 0.8%
0.5%