Multi-Stage Graph Attention Networks for Interpretable Alzheimer's Disease Classification from Genome-Wide Association Data
Saxena, A.; Gaiteri, C.; Faraone, S. V.
Show abstract
BackgroundGenome-wide association studies have identified numerous variants associated with neuropsychiatric disorders. Although some significant loci can carry substantial risk, as in Alzheimers Disease, the remaining genetic variance is distributed across many small-effect loci. Polygenic risk scores (PRS) aggregate this risk but do not capture epistatic interactions, and offer limited biological interpretability and predictive accuracy. Computing gene level risk scores and integrating known or statistically validated gene-gene associations has the potential to increase interpretability and/or accuracy. Graph Neural Networks (GNNs) can leverage graph structured genetic data that models potential epistatic interactions to achieve these goals. MethodsWe developed a three-stage Graph Attention Network (GAT) classifier using individual-level GWAS data from 7,358 participants across seven Alzheimers Disease Center cohorts. Nodes were defined as genes, with risk scores from AD and 11 genetically correlated phenotypes serving as features. We evaluated two graph construction strategies: gene co-expression networks derived from hippocampal transcriptomic data and curated pathway-based graphs. Additionally, a bilinear context module was incorporated to capture global gene-gene interactions beyond the graph topology. In Stage 1, a GNN encoder was trained on the graphs; Stage 2 injected PRS for non-coding SNPs after the encoder to better capture genetic risk via transfer learning, and Stage 3 applied adversarial training with gradient reversal for ancestry debiasing. GNN predictions were ensembled with whole-genome PRS using elastic net regression. ResultsThe best-performing GNN model -- a GAT with bilinear context operating on the pathway graph -- achieved an AUROC of 0.78 (95% CI: 0.75-0.80). Ensemble models combining Stage 2 or 3 GNN logits with whole-genome PRS achieved an AUROC of 0.82 (0.79-0.84), outperforming PRS alone (0.80). GxI attribution and additional explainability analyses revealed stage-specific biological signals, some of which re-capitulated known gene-phenotype associations and others which may reflect potential new areas of inquiry. ConclusionA multi-stage GAT framework captures complementary, non-additive genetic signal that, when ensembled with PRS, improves the accuracy of AD classification. Post-hoc explainability analyses yield biologically interpretable gene networks, supporting the utility of graph-based deep learning for dissecting complex genetic architectures.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.