Back

Assessing Statistical Practices of Existing Artificial Intelligence (AI) Models for Lung Cancer Detection, Prognosis, and Risk Prediction: A Cross-Sectional Meta-Research Study Supplemented by Human and Large Language Model (LLM)-Directed Quality Appraisal

Hou, Y.; Ward, T.; Yang, C.-H.; Jernigan, E.; Caturegli, G.; Boffa, D.; Mukherjee, B.

2025-12-30 oncology
10.64898/2025.12.23.25342849
Show abstract

Artificial intelligence (AI) models with medical images as input data are increasingly proposed to support clinical decisions in lung cancer screening. To assess how these models are developed, evaluated, and reported, and to identify gaps in best statistical practices, we conducted a cross-sectional meta-research study of OpenAlex-indexed studies (January 1, 2023, to June 30, 2025) that developed image-based AI tools to detect lung cancer, predict prognosis, or estimate future risk. Thirty-six studies met our inclusion criteria. Study quality and reporting were appraised using three approaches: subjective ratings from two statisticians and two clinicians, scoring from two AI agents (GPT-5 and Gemini 2.5 Pro), and a guideline-based checklist from the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS). Convolutional neural networks were used in most of the included studies (69%). Area under the curve was the most frequently reported metric (81%). Our meta-research study also highlights common lapses in these 36 studies, including limited external test set use (39%), insufficient subgroup analyses (28%), and a substantial lack of adherence to established prediction-model reporting guidelines. AI-based quality scoring aligned better with CHARMS-based scores than did human scoring. Spearman correlations with CHARMS were weaker for statisticians/clinicians (p [≤] 0.46) than for the two AI agents (GPT-5 p = 0.66; Gemini 2.5 Pro p = 0.56). Overall, future research should prioritize standardized reporting, use of external test sets, and model performance assessment across subpopulations. Large language models (LLMs) offer a supportive role in providing guideline-driven appraisals to complement human judgment in evaluating AI-based prediction models. 1-2 Sentence DescriptionThis cross-sectional meta-research study synthesizes recent studies that developed artificial intelligence (AI)-driven predictive models using medical images to detect lung cancer, predict prognosis, or estimate future risk, highlighting methodological trends, limitations in model testing and subgroup analyses, and advocating for the need for greater transparency, reliability, quality assessment, and adherence to established reporting guidelines in such studies. Quality assessment of the models carried out by LLMs, human statisticians and clinicians indicates chatbots are more aligned with recommended guidelines than humans.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
PLOS ONE
based on 1737 papers
Top 49%
10.4%
2
BMJ Open
based on 553 papers
Top 15%
7.7%
3
JCO Clinical Cancer Informatics
based on 14 papers
Top 0.2%
7.7%
4
Journal of Clinical Epidemiology
based on 29 papers
Top 0.3%
6.5%
5
JAMA Network Open
based on 125 papers
Top 3%
5.4%
6
Cancer Medicine
based on 17 papers
Top 0.7%
4.6%
7
Scientific Reports
based on 701 papers
Top 42%
4.6%
8
Journal of the American Medical Informatics Association
based on 53 papers
Top 3%
3.0%
9
npj Digital Medicine
based on 85 papers
Top 6%
2.9%
50% of probability mass above
10
npj Precision Oncology
based on 14 papers
Top 1%
2.5%
11
JCO Precision Oncology
based on 11 papers
Top 0.8%
2.4%
12
International Journal of Radiation Oncology*Biology*Physics
based on 13 papers
Top 1%
2.4%
13
PeerJ
based on 46 papers
Top 4%
1.8%
14
Computers in Biology and Medicine
based on 39 papers
Top 4%
1.6%
15
Frontiers in Medicine
based on 99 papers
Top 11%
1.6%
16
JMIR Research Protocols
based on 18 papers
Top 2%
1.3%
17
Diagnostics
based on 36 papers
Top 3%
1.3%
18
International Journal of Medical Informatics
based on 25 papers
Top 4%
1.3%
19
Cancers
based on 57 papers
Top 6%
1.3%
20
International Journal of Environmental Research and Public Health
based on 116 papers
Top 22%
0.8%
21
Healthcare
based on 14 papers
Top 3%
0.8%
22
JAMIA Open
based on 35 papers
Top 5%
0.8%
23
Nature Communications
based on 483 papers
Top 40%
0.8%
24
eLife
based on 262 papers
Top 29%
0.8%
25
Frontiers in Oncology
based on 34 papers
Top 5%
0.8%
26
British Journal of Ophthalmology
based on 13 papers
Top 1%
0.8%
27
British Journal of Cancer
based on 22 papers
Top 4%
0.8%
28
PLOS Computational Biology
based on 141 papers
Top 9%
0.8%
29
PLOS Digital Health
based on 88 papers
Top 12%
0.8%
30
Journal of Medical Internet Research
based on 81 papers
Top 16%
0.7%