A multimodal AI model for modeling the genetic risk factor of Alzeihmer's disease
Nguyen, T. M.; Woods, C.; Liu, J.; Wang, C.; Lin, A.-L.; Cheng, J.
Show abstract
The apolipoprotein E {varepsilon}4 (APOE4) allele is the strongest genetic risk factor for late-onset Alzheimer's disease (AD), the most common form of dementia. APOE4 carriers exhibit cerebrovascular and metabolic dysfunction, structural brain alterations, and gut microbiome changes decades before the onset of clinical symptoms. A better understanding of the early manifestation of these physiological changes is critical for the development of timely AD interventions and risk reduction protocols. Multimodal datasets encompassing a wide range of APOE4- and AD-associated biomarkers provide a valuable opportunity to gain insight into the APOE4 phenotype; however, these datasets often present analytical challenges due to small sample sizes and high heterogeneity. Here, we propose a two-stage multimodal AI model (APOEFormer) that integrates blood metabolites, brain vascular and structural MRI, microbiome profiles, and other clinical and demographic data to predict APOE4 allele status. In the first stage, modality-specific encoders are used to generate initial representations of input data modalities, which are aligned in a shared latent space via self-supervised contrastive learning during pretraining. This objective encourages the learning of informative and consistent representations across modalities by leveraging cross-modality relationships. In the second stage, the pretrained representations are used as inputs to a multimodal transformer that integrates information across modalities to predict a key AD risk genetic variant (APOE4). Across 10 independent experimental runs with different train-validation-test splits, APOEFormer predicts whether an individual carries an APOE4 allele with an average accuracy of 75%, demonstrating robust performance under limited sample sizes. Post hoc perturbation analysis of the predictive model revealed valuable insights into the driving components of the APOE4 phenotype, including key blood biomarkers and brain regions strongly associated with APOE4.
Matching journals
The top 9 journals account for 50% of the predicted probability mass.