Bias in respiratory diagnoses by Large Language Models (LLMs) in Low Middle Income Countries (LMICs)
Mouelhi, A.; Patel, K.; Kussad, S.; Ojha, S.; Prayle, A. P.; LMIC Medical AI Alignment Group,
Show abstract
IntroductionClinicians and patients are likely to increasingly use Large Language Models (LLMs) for diagnostic support. Use of LLMs mostly created in North America and Europe, could lead to a High-Income Country bias if used in Low- and Middle-Income Country (LMIC) healthcare settings. We aimed to explore if diagnostic suggestions made by LLMs are relevant in LMIC settings. MethodsFive short respiratory clinical vignettes were produced. For each vignette, a group of doctors from one of 5 countries (Ghana, India, Jordan and Brazil and the UK) independently gave the 4 most likely diagnoses. 4 LLMs (ChatGPT, Claude Sonnet, Google Gemini and Microsoft Copilot) were prompted with the same vignettes. The top 4 diagnoses for each case was requested. A Virtual Private Network (VPN) was used to access the LLM from each of the 4 countries, and in a second experiment the LLM was given the same vignettes but also informed of the country in which the case was based in the prompt. The diagnoses presented by the LLMs was compared with the doctors diagnoses for the LMICs and also compared to the UK. Results106 unique diagnoses were offered by 21 doctors, and 53 by LLMs with a VPN. The LLMs proposed fewer of the doctors diagnoses in LMICs versus in the UK - 50% (95% CI 32.6 to 67.4%) in the UK compared to 32.0% (95% CI 23.1 to 42.3%) in LMICs. This effect persisted when the LLM was informed of the location of the doctor in the prompt. Overall, LLMs performed worse in the LMIC setting (Chi-squared p = 0.028). ConclusionDoctors working in LMICs consider a wider range of diagnoses than LLMs, even when LLMs are queried from that country, or informed that they are in that country. LLMs appear to show a bias when considering likely diagnosis likely related to the epidemiology of high income countries.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.