A VAE-based methodology for deep enterotyping and Parkinson's disease diagnosis
Qiao, Y.; Ma, Z.
Show abstract
Gut microbiome studies in Parkinsons disease (PD) are challenged by high dimensionality, sparsity, compositionality, and substantial between-cohort heterogeneity, all of which complicate robust community typing and disease-status classification. Here, we developed a variational autoencoder (VAE)-based methodology for deep enterotyping and PD diagnosis prediction (i.e., predicting diseased vs. control status) using a harmonized multi-cohort gut microbiome compendium comprising 1,957 16S rRNA samples from six PD case-control cohorts and an independent shotgun metagenomic validation cohort of 725 samples. Compared with conventional enterotyping approaches such as partitioning around medoids (PAM) and Dirichlet multinomial mixture (DMM) modelling, the VAE-derived latent space supported a clearer and more reproducible three-cluster solution. These three enterotype-like community states were biologically interpretable and were annotated as Enterococcus-type, Bacteroides-type, and Ruminococcus-type configurations. The same broad three-enterotype structure was independently recapitulated in the metagenomic dataset, supporting cross-platform robustness. Across the three inferred types, the proportion of PD samples was similar, and both the primary generalized linear mixed-effects model and sensitivity model showed that enterotype assignment was not a significant differentiating factor for PD status and that the lack of association was not dependent on a single modelling strategy. In the supervised branch, VAE-derived representations supported PD case-control classification while also providing a shared latent representation for clustering, enterotype transfer, and downstream interpretation. Collectively, these findings show that deep representation learning can improve the resolution, reproducibility, and interpretability of enterotype inference in heterogeneous microbiome datasets, and provide a practical methodology for organizing broad community structure in PD. In this setting, the main advantage of the VAE method lies in its ability to link unsupervised community typing with supervised prediction through a shared latent representation, even when broad community types do not function as stand-alone disease biomarkers.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.