Back

Individualized melanoma risk prediction using machine learning with electronic health records

Wan, G.; Khattab, S.; Roster, K.; Nguyen, N.; Yan, B.; Rashdan, H.; Estiri, H.; Semenov, Y. R.

2024-07-27 dermatology
10.1101/2024.07.26.24311080 medRxiv
Show abstract

BackgroundMelanoma is a lethal form of skin cancer with a high propensity for metastasizing, making early detection crucial. This study aims to develop a machine learning model using electronic health record data to identify patients at high risk of developing melanoma to prioritize them for dermatology screening. MethodsThis retrospective study included patients diagnosed with melanoma (cases), as well as matched patients without melanoma (controls), from Massachusetts General Hospital (MGH), Brigham and Womens Hospital (BWH), Dana-Farber Cancer Institute (DFCI), and other hospital centers within the Research Patient Data Registry at Mass General Brigham healthcare system between 1992 and 2022. Patient demographics, family history, diagnoses, medications, procedures, laboratory tests, reasons for visits, and allergy data six months prior to the date of first melanoma diagnosis or date of censoring were extracted. A machine learning framework for health outcomes (MLHO) was utilized to build the model. Performance was evaluated using five-fold cross-validation of the MGH cohort (internal validation) and by using the MGH cohort for model training and the non-MGH cohort for independent testing (external validation). The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR), along with 95% Confidence Intervals (CIs), were computed. ResultsThis study identified 10,778 patients with melanoma and 10,778 matched patients without melanoma, including 8,944 from MGH and 1,834 from non-MGH hospitals in each cohort, both with an average follow-up duration of 9 years. In the internal and external validations, the model achieved AUC-ROC values of 0.826 (95% CI: 0.819-0.832) and 0.823 (95% CI: 0.809-0.837) and AUC-PR scores of 0.841 (95% CI: 0.834-0.848) and 0.822 (95% CI: 0.806-0.839), respectively. Important risk features included a family history of melanoma, a family history of skin cancer, and a prior diagnosis of benign neoplasm of skin. Conversely, medical examination without abnormal findings was identified as a protective feature. ConclusionsMachine learning techniques and electronic health records can be effectively used to predict melanoma risk, potentially aiding in identifying high-risk patients and enabling individualized screening strategies for melanoma.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
BMC Cancer
52 papers in training set
Top 0.1%
18.5%
2
Frontiers in Medicine
113 papers in training set
Top 0.3%
9.1%
3
PLOS ONE
4510 papers in training set
Top 22%
8.4%
4
JAMA Network Open
127 papers in training set
Top 0.4%
6.3%
5
European Journal of Cancer
10 papers in training set
Top 0.1%
6.3%
6
Scientific Reports
3102 papers in training set
Top 24%
4.8%
50% of probability mass above
7
Experimental Dermatology
10 papers in training set
Top 0.1%
4.8%
8
PLOS Medicine
98 papers in training set
Top 0.9%
3.9%
9
Cureus
67 papers in training set
Top 1%
3.6%
10
eClinicalMedicine
55 papers in training set
Top 0.1%
3.2%
11
Journal of Investigative Dermatology
42 papers in training set
Top 0.2%
2.4%
12
Frontiers in Public Health
140 papers in training set
Top 3%
2.3%
13
JNCI: Journal of the National Cancer Institute
16 papers in training set
Top 0.3%
1.7%
14
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.4%
1.5%
15
PLOS Neglected Tropical Diseases
378 papers in training set
Top 4%
1.3%
16
Informatics in Medicine Unlocked
21 papers in training set
Top 0.6%
1.3%
17
Experimental Eye Research
30 papers in training set
Top 0.4%
1.2%
18
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.2%
19
Journal of Clinical Pathology
12 papers in training set
Top 0.3%
1.2%
20
Cancers
200 papers in training set
Top 4%
1.1%
21
Cancer Medicine
24 papers in training set
Top 1%
0.9%
22
Frontiers in Immunology
586 papers in training set
Top 6%
0.9%
23
Frontiers in Nutrition
23 papers in training set
Top 1%
0.8%
24
Nature Communications
4913 papers in training set
Top 63%
0.7%
25
British Journal of Cancer
42 papers in training set
Top 2%
0.7%