Back

Pneumonia Detection in Paediatric Chest X-Rays using Ensembled Large Language Models

Tan, J.; Tang, P. H.

2026-04-12 radiology and imaging
10.64898/2026.04.10.26347909 medRxiv
Show abstract

Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 6%
10.3%
2
The Lancet Digital Health
25 papers in training set
Top 0.1%
10.3%
3
PLOS Digital Health
91 papers in training set
Top 0.2%
10.3%
4
PLOS ONE
4510 papers in training set
Top 25%
7.0%
5
European Radiology
14 papers in training set
Top 0.1%
6.5%
6
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
5.0%
7
Diagnostics
48 papers in training set
Top 0.3%
4.3%
50% of probability mass above
8
Nature Medicine
117 papers in training set
Top 0.7%
3.7%
9
npj Digital Medicine
97 papers in training set
Top 1%
2.9%
10
Nature Communications
4913 papers in training set
Top 44%
2.7%
11
iScience
1063 papers in training set
Top 9%
2.1%
12
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.9%
14
Informatics in Medicine Unlocked
21 papers in training set
Top 0.4%
1.9%
15
Patterns
70 papers in training set
Top 0.8%
1.7%
16
Annals of Translational Medicine
17 papers in training set
Top 0.7%
1.7%
17
GigaScience
172 papers in training set
Top 2%
1.5%
18
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.4%
19
eBioMedicine
130 papers in training set
Top 2%
1.3%
20
Neurocomputing
13 papers in training set
Top 0.3%
1.3%
21
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
22
Expert Systems with Applications
11 papers in training set
Top 0.3%
0.9%
23
Frontiers in Neuroinformatics
38 papers in training set
Top 0.6%
0.9%
24
IEEE Access
31 papers in training set
Top 0.7%
0.9%
25
Medical Physics
14 papers in training set
Top 0.6%
0.7%
26
Archives of Clinical and Biomedical Research
28 papers in training set
Top 3%
0.7%
27
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
28
Frontiers in Medicine
113 papers in training set
Top 8%
0.7%
29
npj Precision Oncology
48 papers in training set
Top 1%
0.7%
30
Frontiers in Artificial Intelligence
18 papers in training set
Top 1%
0.5%