Back

A multimodal AI model for modeling the genetic risk factor of Alzeihmer's disease

Nguyen, T. M.; Woods, C.; Liu, J.; Wang, C.; Lin, A.-L.; Cheng, J.

2026-04-15 health informatics
10.64898/2026.04.13.26350803 medRxiv
Show abstract

The apolipoprotein E {varepsilon}4 (APOE4) allele is the strongest genetic risk factor for late-onset Alzheimer's disease (AD), the most common form of dementia. APOE4 carriers exhibit cerebrovascular and metabolic dysfunction, structural brain alterations, and gut microbiome changes decades before the onset of clinical symptoms. A better understanding of the early manifestation of these physiological changes is critical for the development of timely AD interventions and risk reduction protocols. Multimodal datasets encompassing a wide range of APOE4- and AD-associated biomarkers provide a valuable opportunity to gain insight into the APOE4 phenotype; however, these datasets often present analytical challenges due to small sample sizes and high heterogeneity. Here, we propose a two-stage multimodal AI model (APOEFormer) that integrates blood metabolites, brain vascular and structural MRI, microbiome profiles, and other clinical and demographic data to predict APOE4 allele status. In the first stage, modality-specific encoders are used to generate initial representations of input data modalities, which are aligned in a shared latent space via self-supervised contrastive learning during pretraining. This objective encourages the learning of informative and consistent representations across modalities by leveraging cross-modality relationships. In the second stage, the pretrained representations are used as inputs to a multimodal transformer that integrates information across modalities to predict a key AD risk genetic variant (APOE4). Across 10 independent experimental runs with different train-validation-test splits, APOEFormer predicts whether an individual carries an APOE4 allele with an average accuracy of 75%, demonstrating robust performance under limited sample sizes. Post hoc perturbation analysis of the predictive model revealed valuable insights into the driving components of the APOE4 phenotype, including key blood biomarkers and brain regions strongly associated with APOE4.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 0.7%
14.2%
2
Nature Communications
4913 papers in training set
Top 30%
6.2%
3
Communications Biology
886 papers in training set
Top 0.7%
4.8%
4
Medical Image Analysis
33 papers in training set
Top 0.2%
4.8%
5
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.3%
6
NeuroImage: Clinical
132 papers in training set
Top 1%
4.2%
7
Nature Biomedical Engineering
42 papers in training set
Top 0.2%
4.1%
8
Alzheimer's Research & Therapy
52 papers in training set
Top 0.6%
3.9%
9
Bioinformatics
1061 papers in training set
Top 5%
3.5%
50% of probability mass above
10
eBioMedicine
130 papers in training set
Top 0.4%
3.5%
11
Patterns
70 papers in training set
Top 0.3%
3.0%
12
npj Digital Medicine
97 papers in training set
Top 1%
2.8%
13
NeuroImage
813 papers in training set
Top 3%
2.6%
14
Human Brain Mapping
295 papers in training set
Top 2%
2.3%
15
Scientific Reports
3102 papers in training set
Top 54%
1.9%
16
Science Advances
1098 papers in training set
Top 15%
1.9%
17
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
18
Alzheimer's & Dementia
143 papers in training set
Top 2%
1.7%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
20
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.6%
21
Brain
154 papers in training set
Top 3%
1.6%
22
Cell Reports Medicine
140 papers in training set
Top 5%
1.3%
23
Frontiers in Aging Neuroscience
67 papers in training set
Top 2%
1.3%
24
Brain Communications
147 papers in training set
Top 2%
1.3%
25
Science Translational Medicine
111 papers in training set
Top 4%
1.2%
26
GeroScience
97 papers in training set
Top 1%
0.9%
27
Neurobiology of Aging
95 papers in training set
Top 2%
0.8%
28
Nature Medicine
117 papers in training set
Top 5%
0.7%
29
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
30
Biological Psychiatry
119 papers in training set
Top 3%
0.7%