Back

Evaluation of Closed and Open Large Language Models in Pediatric Cardiology Board Exam Performance

Nikolovski, N.; Morgan, C. T.; Gritti, M. N.

2025-06-30 cardiovascular medicine
10.1101/2025.06.28.25330485 medRxiv
Show abstract

IntroductionLarge language models (LLMs) have gained traction in medicine, but there is limited research comparing closed- and open-source models in subspecialty contexts. This study evaluated ChatGPT-4.0o and DeepSeek-R1 on a pediatric cardiology board-style examination to quantify their accuracy and discuss clinical and educational utility. MethodsChatGPT-4.0o and DeepSeek-R1 were used to answer 88 text-based multiple-choice questions across 11 pediatric cardiology subtopics from a Pediatric Cardiology Board Review textbook. DeepSeek-R1s processing time per question was measured. Statistical analyses for model comparison were conducted using an unpaired two-tailed t-test, and bivariate correlations were assessed using Pearsons r. ResultsChatGPT-4.0o and DeepSeek-R1 achieved 70% (62/88) and 68% (60/88) accuracy, respectively (p=0.79). Subtopic accuracy was equal in 5 of 11 chapters, with each model outperforming its counterpart in 3 of 11. DeepSeek-R1s processing time negatively correlated with accuracy (r = -0.68, p = 0.02). ConclusionChatGPT-4.0o and DeepSeek-R1 approached the passing threshold on a pediatric cardiology board examination, with comparable accuracy and potential for open-source models to enhance clinical and educational outcomes while supporting sustainable AI development.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Healthcare
16 papers in training set
Top 0.1%
19.1%
2
Scientific Reports
3102 papers in training set
Top 13%
7.0%
3
PLOS ONE
4510 papers in training set
Top 26%
6.5%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.4%
5
Biology Methods and Protocols
53 papers in training set
Top 0.1%
4.4%
6
PLOS Digital Health
91 papers in training set
Top 0.6%
3.8%
7
Computers in Biology and Medicine
120 papers in training set
Top 0.8%
3.7%
8
BMJ Health & Care Informatics
13 papers in training set
Top 0.1%
3.7%
50% of probability mass above
9
npj Digital Medicine
97 papers in training set
Top 1%
3.1%
10
Journal of NeuroEngineering and Rehabilitation
28 papers in training set
Top 0.4%
2.9%
11
Cureus
67 papers in training set
Top 2%
2.5%
12
JMIR Formative Research
32 papers in training set
Top 0.5%
2.1%
13
DIGITAL HEALTH
12 papers in training set
Top 0.3%
2.1%
14
iScience
1063 papers in training set
Top 11%
1.9%
15
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.9%
16
JMIR Medical Informatics
17 papers in training set
Top 0.7%
1.7%
17
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.5%
18
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.4%
1.3%
19
Medicine
30 papers in training set
Top 2%
1.1%
20
BMC Medical Education
20 papers in training set
Top 0.7%
1.0%
21
Frontiers in Public Health
140 papers in training set
Top 6%
1.0%
22
Cancer Medicine
24 papers in training set
Top 1%
0.9%
23
Data in Brief
13 papers in training set
Top 0.3%
0.9%
24
European Journal of Human Genetics
49 papers in training set
Top 1%
0.8%
25
JMIR Research Protocols
18 papers in training set
Top 1%
0.8%
26
Open Heart
19 papers in training set
Top 1%
0.8%
27
Frontiers in Physiology
93 papers in training set
Top 6%
0.7%
28
JAMA Network Open
127 papers in training set
Top 4%
0.7%
29
Biomolecules
95 papers in training set
Top 3%
0.7%
30
European Heart Journal - Digital Health
15 papers in training set
Top 0.6%
0.7%