Back

Predicting Distant Melanoma Metastasis at Diagnosis Using Machine Learning

Kim, J. J. H.; Lee, J. W. Y.; Yuan, H.; Han, C.; Zandigohar, M.; Haber, R.; Tsoukas, M.; Avanaki, K.

2026-05-19 dermatology
10.64898/2026.05.14.26353271 medRxiv
Show abstract

Distant melanoma metastasis at the time of diagnosis is uncommon, but has major implications for patient prognosis and treatment selection. However, few tools can reliably predict the risk of distant metastasis at initial presentation. Here, we developed and evaluated machine learning models to predict distant melanoma metastasis using routinely captured clinicopathologic and demographic variables across all histologic subtypes. Using the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) program from 2010-2022, we identified adults aged 20 to 90 years with melanoma as the first and only primary malignancy (n=51,285). Explainable Boosting Machine achieved a strong balance of discrimination and precision (AUROC = 0.947, AUPRC = 0.610, Precision = 0.793, Brier = 0.015). At 90% sensitivity, specificity was 0.843 with consistent performance across cross-validation folds. Clinicopathologic variables, including T stage, Breslow thickness, ulceration, and mitotic activity, contributed the largest share of predictive signal across descriptive, regression-based, and SHAP analyses, with smaller contributions from demographic factors. Decision curve analysis supported clinical utility, showing a net reduction of 88.3 per 100 patients and a standardized net benefit of 0.541. This model could be used to identify patients at sufficiently elevated risk to justify staging PET/CT despite otherwise localized clinical presentation. Cost-consequence analysis further showed that imaging true- and false-positive patients at 85% to 95% sensitivity threshold nearly doubled downstream imaging cost. We deployed the final model as an online calculator to support exploration of individualized risk estimates (https://melanoma-calculator.streamlit.app/).

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 1%
28.5%
2
Journal for ImmunoTherapy of Cancer
64 papers in training set
Top 0.2%
7.4%
3
PLOS Medicine
98 papers in training set
Top 0.3%
7.0%
4
European Journal of Cancer
10 papers in training set
Top 0.1%
6.5%
5
Nature Medicine
117 papers in training set
Top 0.3%
6.5%
50% of probability mass above
6
eLife
5422 papers in training set
Top 16%
5.0%
7
JAMA Network Open
127 papers in training set
Top 1%
2.4%
8
Nature Cancer
35 papers in training set
Top 0.5%
2.2%
9
npj Digital Medicine
97 papers in training set
Top 2%
2.1%
10
Nature Machine Intelligence
61 papers in training set
Top 2%
1.9%
11
Modern Pathology
21 papers in training set
Top 0.2%
1.7%
12
Cell Reports Medicine
140 papers in training set
Top 4%
1.7%
13
Scientific Reports
3102 papers in training set
Top 61%
1.5%
14
Blood Advances
54 papers in training set
Top 0.8%
1.4%
15
BMC Cancer
52 papers in training set
Top 2%
1.4%
16
Science Advances
1098 papers in training set
Top 23%
1.3%
17
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.1%
18
PLOS ONE
4510 papers in training set
Top 61%
1.1%
19
Nature Genetics
240 papers in training set
Top 6%
1.1%
20
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
21
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.8%
0.8%
22
Cancer Cell
38 papers in training set
Top 2%
0.8%
23
Cancer Discovery
61 papers in training set
Top 2%
0.8%
24
JCI Insight
241 papers in training set
Top 7%
0.7%
25
Breast Cancer Research
32 papers in training set
Top 0.5%
0.7%
26
Cancer Medicine
24 papers in training set
Top 1%
0.7%
27
npj Breast Cancer
18 papers in training set
Top 0.2%
0.7%
28
Frontiers in Medicine
113 papers in training set
Top 7%
0.7%
29
JNCI: Journal of the National Cancer Institute
16 papers in training set
Top 0.8%
0.7%
30
Science Translational Medicine
111 papers in training set
Top 7%
0.7%