Back

Predicting Alzheimer's Disease Diagnosis, a Decade or more Years before Onset using the Electronic Health Record and Random Forest Machine Learning Models

Taneja, S. B.; Boyce, R. D.; Malec, S. A.; Shaaban, C. E.; Levine, A. S.; Munro, P.; Bian, J.; Xu, J.; Maraganore, D.; Schliep, K.; Wu, E.; Silverstein, J. C.; Kienholz, M.; Karim, H.

2025-11-06 health informatics
10.1101/2025.11.04.25338396 medRxiv
Show abstract

INTRODUCTIONThere is need to detect and intervene in pre-clinical phases of Alzheimers disease (AD). Electronic health records (EHRs) may help predict AD using machine learning methods. METHODSWe identified EHRs for 19,473 cases with AD and 111,922 controls. Records spanned 10 or more years prior to AD diagnosis. We trained a random forest model (employing 5-fold cross-validation with 2,499 features) to predict AD 10 years prior to its onset using a 75/25% train/test split and then computed permuted feature importance. RESULTSWe achieved an area under the ROC curve of 0.80. Feature importance identified factors associated with AD, including age, sex, race, ethnicity, BMI, cardiovascular diseases, inflammation, pain, sleep and mood disorders, trauma, other neurodegenerative disorders, diuretics, colon-related disorders and procedures, seizures, and vitamin B12. DISCUSSIONThis is the first EHR-based model to predict AD 10 years prior to onset, which could help predict AD and inform prevention/early intervention.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.1%
12.5%
2
The Journal of Prevention of Alzheimer's Disease
10 papers in training set
Top 0.1%
12.3%
3
Annals of Neurology
57 papers in training set
Top 0.1%
10.1%
4
Age and Ageing
27 papers in training set
Top 0.1%
6.8%
5
Neurobiology of Aging
95 papers in training set
Top 0.6%
4.8%
6
Alzheimer's & Dementia
143 papers in training set
Top 1%
4.3%
50% of probability mass above
7
Alzheimer's Research & Therapy
52 papers in training set
Top 0.6%
3.9%
8
BMJ Open
554 papers in training set
Top 6%
3.6%
9
Frontiers in Aging Neuroscience
67 papers in training set
Top 1%
3.6%
10
PLOS ONE
4510 papers in training set
Top 46%
2.4%
11
Journal of Alzheimer's Disease
43 papers in training set
Top 0.6%
2.1%
12
GeroScience
97 papers in training set
Top 0.8%
2.1%
13
Alzheimer's & Dementia: Translational Research & Clinical Interventions
16 papers in training set
Top 0.2%
2.1%
14
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
15
Frontiers in Neurology
91 papers in training set
Top 3%
1.7%
16
BMC Infectious Diseases
118 papers in training set
Top 4%
1.2%
17
Scientific Reports
3102 papers in training set
Top 67%
1.2%
18
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.2%
19
eClinicalMedicine
55 papers in training set
Top 1%
1.2%
20
JAMIA Open
37 papers in training set
Top 1%
0.9%
21
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
22
The Lancet Digital Health
25 papers in training set
Top 1.0%
0.8%
23
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.8%
0.7%
24
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%
25
Communications Medicine
85 papers in training set
Top 1%
0.7%
26
American Journal of Epidemiology
57 papers in training set
Top 2%
0.6%
27
NeuroImage: Clinical
132 papers in training set
Top 4%
0.6%
28
Brain, Behavior, & Immunity - Health
27 papers in training set
Top 0.7%
0.6%
29
JMIR Medical Informatics
17 papers in training set
Top 2%
0.6%
30
Journal of Personalized Medicine
28 papers in training set
Top 2%
0.6%