Back

Zero-Shot Evaluation of Kimi K2 on Pediatric Clinical Cases

Mondillo, G.; Masino, M.; Colosimo, S.; Perrotta, A.; Frattolillo, V.; Abbate, F. G.

2025-07-29 pediatrics
10.1101/2025.07.29.25332368 medRxiv
Show abstract

BackgroundThe application of large language models (LLMs) in pediatric medicine requires rigorous performance evaluation prior to clinical implementation. ObjectiveTo evaluate the accuracy of the Kimi K2 model in analyzing pediatric clinical cases using a zero-shot approach. Methods: 2,249 multiple-choice questions from pediatric clinical cases, ranging in age from 1 day to 16 years, extracted from the MedQA dataset were analyzed. The model was tested via API with standardized parameters, temperature set to zero, and zero-shot prompts. Accuracy was calculated by comparing the responses with the datasets ground truth. ResultsKimi K2 achieved an overall accuracy of 78.39%, corresponding to 1,763 correct answers out of 2,249 total, with 100% of responses in the required format. Conclusions: The model demonstrates competitive performance for medical education and diagnostic support, while still having limitations that require human clinical supervision.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.1%
19.2%
2
BioData Mining
15 papers in training set
Top 0.1%
9.4%
3
Scientific Reports
3102 papers in training set
Top 9%
8.6%
4
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.1%
8.6%
5
Healthcare
16 papers in training set
Top 0.1%
6.5%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 30%
5.0%
7
International Journal of Medical Informatics
25 papers in training set
Top 0.3%
4.4%
8
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.9%
3.3%
9
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.1%
10
JMIRx Med
31 papers in training set
Top 0.4%
2.1%
11
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
2.1%
12
Medicine
30 papers in training set
Top 1.0%
1.8%
13
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
14
JMIR Medical Informatics
17 papers in training set
Top 1%
1.3%
15
Annals of Translational Medicine
17 papers in training set
Top 0.9%
1.1%
16
Biology Methods and Protocols
53 papers in training set
Top 2%
1.0%
17
Cureus
67 papers in training set
Top 4%
0.9%
18
JAMA Network Open
127 papers in training set
Top 3%
0.9%
19
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
21
Artificial Intelligence in Medicine
15 papers in training set
Top 0.7%
0.8%
22
BMC Medical Education
20 papers in training set
Top 0.8%
0.8%
23
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.7%
24
GigaScience
172 papers in training set
Top 3%
0.7%
25
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.7%
26
Bioengineering
24 papers in training set
Top 2%
0.7%
27
Data in Brief
13 papers in training set
Top 0.6%
0.7%
28
BMC Bioinformatics
383 papers in training set
Top 8%
0.7%