Back

Discovering Genetic Signatures Associated with Alzheimer's Disease in Tiled Whole Genome Sequence Data: Results from the Artificial Intelligence for Alzheimer's Disease (AI4AD) Consortium

Zaranek, S. W.; Zaranek, A. W.; Amstutz, P.; Bao, J.; Chen, J.; Clegg, T.; Craft, H.; Jo, T.; Lee, B.; Nho, K.; Thomopoulos, S. I.; Davatzikos, C.; Shen, L.; Huang, H.; Thompson, P. M.; Saykin, A. J.; The Alzheimer's Disease Neuroimaging Initiative as a consortium author for the AI4AD Initiative,

2024-08-03 genetic and genomic medicine
10.1101/2024.08.01.24311329 medRxiv
Show abstract

Currently, the ability to analyze large-scale whole genome sequence (WGS) data is limited due to both the size of the data and the inability of many existing tools to scale. To address this challenge, we use data "tiling" to efficiently partition whole genome sequences into smaller segments resulting in a simple numeric matrix of small integers. This lossless representation is particularly suitable for machine learning (ML) models. As an example of the benefits of tiling, we showcase results from tiled data as part of the Artificial Intelligence for Alzheimers Disease (AI4AD) consortium. AI4AD is a coordinated initiative to develop transformative AI approaches for high throughput analysis of next generation sequencing and related imaging, AD biomarker, and cognitive data. The collective effort integrates imaging, genomic, biomarker, and cognitive data to address fundamental barriers in AD prevention and drug discovery. One of the projects initial aims is to discover new genetic signatures in WGS data that can be used to understand AD risk and progression in conjunction with imaging, biomarker and cognitive data. We tiled and analyzed 15,000+ genomes from the Alzheimers Disease Sequencing Project (ADSP) and the Alzheimers Disease Neuroimaging Initiative (ADNI). We tile 11,762 genomes, a subset of the release which does not include family-based datasets (AD Cases: 4,983, age range: 50-90 years, mean age: 73.8 years). We illustrate the use of tiled data in ML classification methods to predict phenotypes. Specifically, we identify and prioritize tile variants/genetic variants that are possible genetic signatures for AD. The model shows added predictive value from variants of genes previously found to be associated with AD risk, age of onset, neurofibrillary tangle measurements, and other AD-related traits-including the APOE variant (rs429358).

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 3%
10.1%
2
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
8.4%
3
Frontiers in Aging Neuroscience
67 papers in training set
Top 0.5%
6.4%
4
Frontiers in Genetics
197 papers in training set
Top 1%
4.8%
5
Frontiers in Neuroscience
223 papers in training set
Top 0.7%
4.8%
6
Scientific Reports
3102 papers in training set
Top 31%
4.0%
7
Alzheimer's & Dementia
143 papers in training set
Top 1%
3.9%
8
Brain Communications
147 papers in training set
Top 1%
2.6%
9
Alzheimer's Research & Therapy
52 papers in training set
Top 0.9%
2.6%
10
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
50% of probability mass above
11
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
12
Frontiers in Cellular Neuroscience
79 papers in training set
Top 0.3%
2.1%
13
PLOS ONE
4510 papers in training set
Top 50%
1.9%
14
Alzheimer's & Dementia: Translational Research & Clinical Interventions
16 papers in training set
Top 0.3%
1.8%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.8%
16
iScience
1063 papers in training set
Top 13%
1.8%
17
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.7%
18
BMC Genomics
328 papers in training set
Top 3%
1.7%
19
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
20
Diagnostics
48 papers in training set
Top 1%
1.5%
21
Genome Medicine
154 papers in training set
Top 5%
1.5%
22
Data in Brief
13 papers in training set
Top 0.1%
1.3%
23
Frontiers in Human Neuroscience
67 papers in training set
Top 2%
1.2%
24
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.1%
25
NeuroImage
813 papers in training set
Top 5%
0.9%
26
Neurobiology of Aging
95 papers in training set
Top 2%
0.9%
27
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.9%
0.9%
28
npj Parkinson's Disease
89 papers in training set
Top 1.0%
0.8%
29
GigaScience
172 papers in training set
Top 3%
0.8%
30
GeroScience
97 papers in training set
Top 2%
0.7%