Back

Development and Validation of a Machine Learning Model to Predict Prognosis in Patients with Advanced Head and Neck Cancer

Zhang, K.; Gao, L.; John, D.; Li, W. T.; Hogarth, M.; Coffey, C. S.; Ongkeko, W. M.

2026-05-28 oncology
10.64898/2026.05.27.26354194 medRxiv
Show abstract

Importance Prognostic tools beyond staging are needed to guide treatment and counseling in head and neck squamous cell carcinoma (HNSCC). Objective To develop and externally validate a machine learning model predicting survival in advanced HNSCC using routinely collected clinical and biomarker data. Design, Setting, and Participants Retrospective, multi-institutional cohort study including 2,385 patients with stage III-IV HNSCC diagnosed from 2012-2022 in the University of California Health Data Warehouse (UCHDW). Patients were randomly split into training (n = 1,908) and test (n = 477) sets. Partial external validation used 7,749 patients from the Surveillance, Epidemiology, and End Results (SEER) registry (2010-2020). Exposures Demographic, tumor, treatment, comorbidity, and biomarker variables recorded at or before diagnosis. Main Outcomes and Measures The primary outcome was all-cause mortality within 70 months. Cox proportional hazards models included all predictors. Discrimination was assessed with Harrell's concordance index (C-index), calibration with predicted vs observed survival, and stratification with Kaplan-Meier curves. A Random Survival Forest (RSF) was trained for benchmarking and interpretability using Shapley Additive exPlanations (SHAP). Results Among 2,385 patients in UCHDW (median age, 63 years; 29.0% mortality), the Cox model achieved a C-index of 0.735 in the internal test set. Risk quartiles showed clear separation on Kaplan-Meier curves (log-rank p < 0.0001). In the SEER cohort (n = 7,749), where only demographic, staging, subsite, and treatment variables were available, the reduced Cox model achieved a C-index of 0.688, with calibration showing modest underestimation of survival in high-risk groups. Age, T stage, Charlson Comorbidity Index, neutrophil-to-lymphocyte ratio, and platelet count were among the strongest predictors, while surgery was associated with improved survival. The RSF achieved a C-index of 0.758 internally, with SHAP highlighting nonlinear effects of albumin, BMI, and inflammatory markers. Conclusions and Relevance A machine learning model using routine clinical and biomarker data demonstrated good prognostic performance in advanced HNSCC, with partial external validation. Such approaches may support individualized survival estimates, risk stratification, and treatment discussions, but broader validation is required before clinical adoption.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 22%
8.3%
2
Cancers
200 papers in training set
Top 0.7%
6.7%
3
Scientific Reports
3102 papers in training set
Top 19%
6.3%
4
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
6.2%
5
Frontiers in Oncology
95 papers in training set
Top 0.7%
4.8%
6
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.1%
4.8%
7
JCO Precision Oncology
14 papers in training set
Top 0.1%
4.8%
8
British Journal of Cancer
42 papers in training set
Top 0.3%
4.3%
9
International Journal of Cancer
42 papers in training set
Top 0.2%
3.9%
50% of probability mass above
10
JAMA Network Open
127 papers in training set
Top 1%
3.5%
11
Cancer Medicine
24 papers in training set
Top 0.4%
3.5%
12
European Journal of Cancer
10 papers in training set
Top 0.1%
3.5%
13
Annals of Oncology
13 papers in training set
Top 0.3%
2.7%
14
JNCI: Journal of the National Cancer Institute
16 papers in training set
Top 0.2%
2.6%
15
Clinical Cancer Research
58 papers in training set
Top 0.7%
2.4%
16
PLOS Computational Biology
1633 papers in training set
Top 14%
2.1%
17
PeerJ
261 papers in training set
Top 7%
1.7%
18
JNCI Cancer Spectrum
10 papers in training set
Top 0.3%
1.3%
19
Nature Communications
4913 papers in training set
Top 56%
1.3%
20
eLife
5422 papers in training set
Top 48%
1.3%
21
Biology Methods and Protocols
53 papers in training set
Top 1%
1.2%
22
EClinicalMedicine
21 papers in training set
Top 0.5%
1.2%
23
Cancer Research Communications
46 papers in training set
Top 0.8%
1.1%
24
Breast Cancer Research
32 papers in training set
Top 0.4%
1.1%
25
Frontiers in Immunology
586 papers in training set
Top 6%
0.9%
26
BMC Cancer
52 papers in training set
Top 2%
0.9%
27
OncoImmunology
22 papers in training set
Top 0.3%
0.9%
28
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
29
Radiotherapy and Oncology
18 papers in training set
Top 0.3%
0.9%
30
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%