Back

Evaluating the Performance of Artificial Intelligence in Generating Differential Diagnoses for Infectious Diseases Cases: A Comparative Study of Large Language Models

Mondal, A.; Karad, R. K.; Bhattacharjee, B.; Saha, B.

2024-06-30 infectious diseases
10.1101/2024.06.28.24309694 medRxiv
Show abstract

BackgroundArtificial Intelligence (AI) has potential to transform healthcare including the field of infectious diseases diagnostics. This study assesses the capability of three large language models (LLMs), GPT 4, Llama 3, and Gemini 1.5 to generate differential diagnoses, comparing their outputs against those of medical experts to evaluate AIs potential in augmenting clinical decision-making. MethodsThis study evaluates the differential diagnosis capabilities of three LLMs, GPT 4, Llama 3, and Gemini 1.5, using 50 simulated infectious disease cases. The cases were diverse, complex, and reflective of common clinical scenarios, including detailed histories, symptoms, lab results, and imaging findings. Each model received standardized case information and produced differential diagnoses, which were then compared to reference differential diagnosis lists created by medical experts. The analysis utilized the Jaccard index and Kendalls Tau to assess similarity and order accuracy, summarizing findings with mean, standard deviation, and combined p-values. ResultsThe mean numbers of differential diagnoses generated by GPT 4, Llama 3, and Gemini 1.5 were 6.22, 5.06, and 10.02 respectively which was significantly different (p<0.001) from the medical experts. The mean Jac-card index of GPT 4, Llama 3, and Gemini 1.5 were 0.3, 0.21, and 0.24 while the mean Kendalls Tau were 0.4, 0.7, and 0.33 respectively. The combined p-value of GPT 4, Llama 3, and Gemini 1.5 were 1, 1, 0.979 respectively indicating no significant association between the differential diagnosis generated by the LLMs and the medical experts. ConclusionAlthough LLMs like GPT 4, Llama 3, and Gemini 1.5 exhibit varying effectiveness, none align significantly with expert-level diagnostic accuracy, emphasizing the need for further development and refinement. The findings highlight the importance of rigorous validation, ethical considerations, and seamless integration into clinical workflows to ensure AI tools enhance healthcare delivery and patient outcomes effectively.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Biology Methods and Protocols
53 papers in training set
Top 0.1%
17.6%
2
PLOS ONE
4510 papers in training set
Top 18%
10.2%
3
Scientific Reports
3102 papers in training set
Top 14%
6.9%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.4%
6.4%
5
npj Digital Medicine
97 papers in training set
Top 0.9%
4.9%
6
JMIR Medical Informatics
17 papers in training set
Top 0.3%
3.9%
7
Frontiers in Public Health
140 papers in training set
Top 3%
2.6%
50% of probability mass above
8
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.6%
9
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.4%
10
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
11
PLOS Digital Health
91 papers in training set
Top 1%
1.9%
12
JMIRx Med
31 papers in training set
Top 0.5%
1.9%
13
BMC Infectious Diseases
118 papers in training set
Top 3%
1.7%
14
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.7%
15
Frontiers in Medicine
113 papers in training set
Top 3%
1.7%
16
PeerJ
261 papers in training set
Top 7%
1.7%
17
European Radiology
14 papers in training set
Top 0.4%
1.5%
18
American Journal of Infection Control
12 papers in training set
Top 0.1%
1.5%
19
Healthcare
16 papers in training set
Top 0.8%
1.5%
20
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.5%
21
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.3%
22
Journal of Biomedical Informatics
45 papers in training set
Top 1%
1.0%
23
BMC Medical Education
20 papers in training set
Top 0.7%
0.9%
24
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
25
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.7%
0.8%
26
BMJ Open
554 papers in training set
Top 13%
0.7%
27
F1000Research
79 papers in training set
Top 5%
0.7%
28
Archives of Clinical and Biomedical Research
28 papers in training set
Top 4%
0.5%