Back

Temporal Cohort Identification for Alzheimer's Disease with Sequences of Clinical Records

Estiri, H.; Azhir, A.; Blacker, D. L.; Ritchie, C. S.; Patel, C. J.; Murphy, S. N.

2023-03-05 health informatics
10.1101/2023.03.03.23286774 medRxiv
Show abstract

BACKGROUNDAlzheimers Disease (AD) is a complex clinical phenotype with unprecedented social and economic tolls in an aging global population. Real World Data (RWD) from electronic health records (EHRs) offer opportunities to accelerate precision drug development and scale epidemiological research on AD. A precise characterization of AD cohorts is needed to address the noise abundant in RWD. METHODSWe conducted a retrospective cohort study to develop and test computational models for AD cohort identification using clinical data from 8 Massachusetts healthcare systems. We mined temporal representations from EHR data using a novel transitive sequential pattern mining algorithm (tSPM) to train and validate our models. We then tested our models against a held-out test set from a review of medical records to adjudicate the presence of AD. We trained two classes of models using Gradient Boosting Machine (GBM) to compare the utility of AD diagnosis records versus the tSPM temporal representations (comprising sequences of diagnosis and medication observations) from electronic medical records for characterizing AD cohorts. RESULTSIn a group of 4,985 patients, we identified 219 sequences of medication-diagnosis records for constructing the best classification models. The models with the sequential features improved AD classification by a magnitude of up to 16 percent (over the use of AD diagnosis codes). Six groups of sequences, which we refer to as temporal digital markers, were identified for characterizing the AD cohorts, including sequences that involved (1) a symptom or (2) a risk factor in the past, followed by an AD diagnosis, (3) AD medications, (4) indirect risk factors, symptom management, and potential side effects, (5) comorbidities with possible shared roots or side effects, and (6) plural encounters with of AD diagnosis codes. Discussions of how the identified sequential patterns can be interpreted are provided. CONCLUSIONSWe present sequential patterns of diagnosis and medication codes from electronic medical records, as digital markers of Alzheimers Disease. Classification algorithms developed on the sequential patterns can replace standard features from EHRs to enrich phenotype modeling.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
The Journal of Prevention of Alzheimer's Disease
10 papers in training set
Top 0.1%
10.1%
2
npj Digital Medicine
97 papers in training set
Top 0.5%
10.1%
3
Alzheimer's Research & Therapy
52 papers in training set
Top 0.3%
6.4%
4
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.2%
6.4%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.4%
6
Frontiers in Aging Neuroscience
67 papers in training set
Top 0.8%
4.3%
7
JAMIA Open
37 papers in training set
Top 0.3%
4.2%
8
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
4.2%
50% of probability mass above
9
Annals of Neurology
57 papers in training set
Top 0.5%
3.6%
10
PLOS ONE
4510 papers in training set
Top 42%
3.1%
11
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.7%
12
Age and Ageing
27 papers in training set
Top 0.2%
2.1%
13
Alzheimer's & Dementia
143 papers in training set
Top 2%
2.1%
14
Scientific Reports
3102 papers in training set
Top 54%
1.9%
15
Bioinformatics
1061 papers in training set
Top 7%
1.9%
16
Journal of Alzheimer's Disease
43 papers in training set
Top 0.8%
1.7%
17
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.5%
18
JMIR Medical Informatics
17 papers in training set
Top 0.9%
1.5%
19
Alzheimer's & Dementia: Translational Research & Clinical Interventions
16 papers in training set
Top 0.5%
1.2%
20
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.2%
21
NeuroImage: Clinical
132 papers in training set
Top 3%
1.2%
22
Communications Medicine
85 papers in training set
Top 0.5%
1.2%
23
Computers in Biology and Medicine
120 papers in training set
Top 3%
0.9%
24
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.9%
25
Experimental Neurology
57 papers in training set
Top 1%
0.9%
26
GeroScience
97 papers in training set
Top 1%
0.9%
27
BMJ Open
554 papers in training set
Top 11%
0.9%
28
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.8%
29
Frontiers in Neurology
91 papers in training set
Top 5%
0.8%
30
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.8%
0.7%