Back

A hybrid-computer vision model to predict lung cancer in diverse patient populations

Zakkar, A.; Perwaiz, N.; Zhong, W.; Krule, A.; Burrage-Burton, M.; Kim, D.; Miglani, M.; Narra, V.; Yousef, F.; Gadi, V.; Korpics, M. C.; Kim, S. J.; Khan, A. A.; Molina, Y.; Dai, Y.; Marai, E.; Meidani, H.; Nguyen, R.; Salahudeen, A. A.

2024-10-07 oncology
10.1101/2024.10.07.24315011 medRxiv
Show abstract

PURPOSEDisparities of lung cancer incidence exist in Black populations and screening criteria underserve Black populations due to disparately elevated risk in the screening eligible population. Prediction models that integrate clinical and imaging-based features to individualize lung cancer risk is a potential means to mitigate these disparities. PATIENTS AND METHODSThis Multicenter (NLST) and catchment population based (UIH, urban and suburban Cook County) cross-sectional study utilized participants at risk of lung cancer with available lung CT imaging and follow up between the years 2015 and 2024. 53,452 in NLST and 11,654 in UIH were included based on age and tobacco use based risk factors for lung cancer. Cohorts were used for training and testing of deep and machine learning models using clinical features alone or combined with CT image features (hybrid computer vision). RESULTSAn optimized 7 clinical feature model achieved ROC-AUC values ranging 0.64-0.67 in NLST and 0.60-0.65 in UIH cohorts across multiple years. Incorporation of imaging features to form a hybrid computer vision model significantly improved ROC-AUC values to 0.78-0.91 in NLST but deteriorated in UIH with ROC-AUC values of 0.68-0.80, attributable to Black participants where ROC-AUC values ranged from 0.63-0.72 across multiple years. Retraining the hybrid computer vision model by incorporating Black and other participants from the UIH cohort improved performance with ROC-AUC values of 0.70-0.87 in a held out UIH test set. CONCLUSIONHybrid computer vision predicted risk with improved accuracy compared to clinical risk models alone. However, potential biases in image training data reduced model generalizability in Black participants. Performance was improved upon retraining with a subset of the UIH cohort, suggesting that inclusive training and validation datasets can minimize racial disparities. Future studies incorporating vision models trained on representative data sets may demonstrate improved health equity upon clinical use.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
JAMA Network Open
127 papers in training set
Top 0.1%
14.3%
2
PLOS ONE
4510 papers in training set
Top 14%
14.3%
3
Annals of Epidemiology
19 papers in training set
Top 0.1%
8.4%
4
BMJ Open
554 papers in training set
Top 3%
7.1%
5
Scientific Reports
3102 papers in training set
Top 28%
4.3%
6
EClinicalMedicine
21 papers in training set
Top 0.1%
3.9%
50% of probability mass above
7
Diagnostics
48 papers in training set
Top 0.5%
3.6%
8
Annals of Translational Medicine
17 papers in training set
Top 0.4%
2.3%
9
JNCI: Journal of the National Cancer Institute
16 papers in training set
Top 0.2%
2.3%
10
Cancers
200 papers in training set
Top 3%
1.9%
11
Journal of Translational Medicine
46 papers in training set
Top 0.6%
1.9%
12
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.5%
1.7%
13
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.4%
1.5%
14
Biology Methods and Protocols
53 papers in training set
Top 1%
1.5%
15
Thorax
32 papers in training set
Top 0.5%
1.3%
16
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.2%
17
JMIR Medical Informatics
17 papers in training set
Top 1%
1.2%
18
Cancer Medicine
24 papers in training set
Top 1%
1.1%
19
European Radiology
14 papers in training set
Top 0.5%
1.1%
20
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.1%
21
BMC Infectious Diseases
118 papers in training set
Top 4%
0.9%
22
JCO Precision Oncology
14 papers in training set
Top 0.3%
0.9%
23
JNCI Cancer Spectrum
10 papers in training set
Top 0.4%
0.9%
24
PeerJ
261 papers in training set
Top 14%
0.8%
25
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
26
Nature Communications
4913 papers in training set
Top 61%
0.8%
27
International Journal of Radiation Oncology*Biology*Physics
21 papers in training set
Top 0.4%
0.7%
28
Annals of Biomedical Engineering
34 papers in training set
Top 1%
0.7%
29
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
30
Journal of Clinical Medicine
91 papers in training set
Top 7%
0.7%