Back

Pneumonia Detection in Paediatric Chest X-Rays using Ensembled Large Language Models

Tan, J.; Tang, P. H.

2026-04-12 radiology and imaging
10.64898/2026.04.10.26347909 medRxiv
Show abstract

BackgroundPa4ediatric pneumonia is a major cause of childhood morbidity and mortality. Chest X-rays (CXR) are central to diagnosis, but shortages of specialist radiologists can delay reporting. Multimodal large language models (MLLMs) may assist clinical workflows by analysing images and communicating findings, although their diagnostic performance remains below state-of-the-art classifiers. ObjectiveTo evaluate whether ensemble strategies improve MLLM diagnostic performance for paediatric radiological pneumonia detection on CXRs. MethodsIn this retrospective study, paediatric CXRs from two datasets (balanced and real-world) at KK Womens and Childrens Hospital were analysed. Images were independently reviewed by two board-certified radiologists, with pneumonia severity assigned to three classes using a predefined consensus algorithm. Fifteen MedGemma-4B-it agents classified each CXR into five likelihood categories, which were mapped to the three severity classes for evaluation. Majority voting, soft voting and GPTOSS-20B aggregation were compared with baseline average agent performance. The primary outcome was One-vs-Rest (OvR) AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohens {kappa} and One-vs-One (OvO) AUROC. ResultsThe balanced dataset contained 900 CXRs and the real-world dataset 1300 CXRs. Soft voting significantly improved OvR-AUROC compared with baseline in both datasets (Balanced: 0.829>0.764; 95%CI=0.752-0.779; P=0.0002. Real-world: 0.728>0.655; 95%CI=0.638-0.679; P=0.0003). Soft voting also improved accuracy, Cohens {kappa}, OvO-AUROC in both datasets and F1-score in the balanced dataset. ConclusionSoft voting enhances MedGemmas diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our systems high specificity supports triage by flagging high-risk radiological pneumonia cases. Clinical ImpactO_LIPaediatric CXRs often face reporting delays exceeding 24 hours due to radiologist shortages. C_LIO_LIOur proposed MLLM ensemble framework achieves better than average MLLM diagnostic discrimination for radiological pneumonia without requiring cloud-based systems. C_LIO_LISoft-voting aggregation enhances diagnostic discriminatory effectiveness for paediatric pneumonia severity, while preserving explainable outputs. C_LIO_LIOur system acts as a decision support tool that identifies higher-risk pneumonia cases for urgent review, supporting safer triage. C_LI

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 2%
14.9%
2
PLOS Digital Health
91 papers in training set
Top 0.1%
14.5%
3
PLOS ONE
4510 papers in training set
Top 26%
6.5%
4
The Lancet Digital Health
25 papers in training set
Top 0.1%
6.4%
5
European Radiology
14 papers in training set
Top 0.1%
4.9%
6
Annals of Translational Medicine
17 papers in training set
Top 0.3%
3.6%
50% of probability mass above
7
Informatics in Medicine Unlocked
21 papers in training set
Top 0.2%
3.6%
8
JMIR Medical Informatics
17 papers in training set
Top 0.4%
2.6%
9
Diagnostics
48 papers in training set
Top 0.7%
2.4%
10
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.4%
2.1%
11
International Journal of Medical Informatics
25 papers in training set
Top 0.6%
2.1%
12
Nature Communications
4913 papers in training set
Top 47%
2.1%
13
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
14
npj Digital Medicine
97 papers in training set
Top 2%
1.9%
15
GigaScience
172 papers in training set
Top 1%
1.9%
16
iScience
1063 papers in training set
Top 12%
1.8%
17
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.5%
18
Patterns
70 papers in training set
Top 1%
1.5%
19
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.2%
20
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.1%
21
BMC Medicine
163 papers in training set
Top 5%
1.0%
22
Frontiers in Neuroinformatics
38 papers in training set
Top 0.6%
1.0%
23
Expert Systems with Applications
11 papers in training set
Top 0.3%
1.0%
24
IEEE Access
31 papers in training set
Top 0.7%
0.9%
25
Sensors
39 papers in training set
Top 2%
0.9%
26
Frontiers in Medicine
113 papers in training set
Top 6%
0.8%
27
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
28
JAMIA Open
37 papers in training set
Top 1%
0.8%
29
PeerJ
261 papers in training set
Top 14%
0.8%
30
Archives of Clinical and Biomedical Research
28 papers in training set
Top 3%
0.7%