Predicting Alzheimer's Disease Diagnosis, a Decade or more Years before Onset using the Electronic Health Record and Random Forest Machine Learning Models
Taneja, S. B.; Boyce, R. D.; Malec, S. A.; Shaaban, C. E.; Levine, A. S.; Munro, P.; Bian, J.; Xu, J.; Maraganore, D.; Schliep, K.; Wu, E.; Silverstein, J. C.; Kienholz, M.; Karim, H.
Show abstract
INTRODUCTIONThere is need to detect and intervene in pre-clinical phases of Alzheimers disease (AD). Electronic health records (EHRs) may help predict AD using machine learning methods. METHODSWe identified EHRs for 19,473 cases with AD and 111,922 controls. Records spanned 10 or more years prior to AD diagnosis. We trained a random forest model (employing 5-fold cross-validation with 2,499 features) to predict AD 10 years prior to its onset using a 75/25% train/test split and then computed permuted feature importance. RESULTSWe achieved an area under the ROC curve of 0.80. Feature importance identified factors associated with AD, including age, sex, race, ethnicity, BMI, cardiovascular diseases, inflammation, pain, sleep and mood disorders, trauma, other neurodegenerative disorders, diuretics, colon-related disorders and procedures, seizures, and vitamin B12. DISCUSSIONThis is the first EHR-based model to predict AD 10 years prior to onset, which could help predict AD and inform prevention/early intervention.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.