Back

A VAE-based methodology for deep enterotyping and Parkinson's disease diagnosis

Qiao, Y.; Ma, Z.

2026-03-19 neurology
10.64898/2026.03.17.26348604 medRxiv
Show abstract

Gut microbiome studies in Parkinsons disease (PD) are challenged by high dimensionality, sparsity, compositionality, and substantial between-cohort heterogeneity, all of which complicate robust community typing and disease-status classification. Here, we developed a variational autoencoder (VAE)-based methodology for deep enterotyping and PD diagnosis prediction (i.e., predicting diseased vs. control status) using a harmonized multi-cohort gut microbiome compendium comprising 1,957 16S rRNA samples from six PD case-control cohorts and an independent shotgun metagenomic validation cohort of 725 samples. Compared with conventional enterotyping approaches such as partitioning around medoids (PAM) and Dirichlet multinomial mixture (DMM) modelling, the VAE-derived latent space supported a clearer and more reproducible three-cluster solution. These three enterotype-like community states were biologically interpretable and were annotated as Enterococcus-type, Bacteroides-type, and Ruminococcus-type configurations. The same broad three-enterotype structure was independently recapitulated in the metagenomic dataset, supporting cross-platform robustness. Across the three inferred types, the proportion of PD samples was similar, and both the primary generalized linear mixed-effects model and sensitivity model showed that enterotype assignment was not a significant differentiating factor for PD status and that the lack of association was not dependent on a single modelling strategy. In the supervised branch, VAE-derived representations supported PD case-control classification while also providing a shared latent representation for clustering, enterotype transfer, and downstream interpretation. Collectively, these findings show that deep representation learning can improve the resolution, reproducibility, and interpretability of enterotype inference in heterogeneous microbiome datasets, and provide a practical methodology for organizing broad community structure in PD. In this setting, the main advantage of the VAE method lies in its ability to link unsupervised community typing with supervised prediction through a shared latent representation, even when broad community types do not function as stand-alone disease biomarkers.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 1%
10.0%
2
Microbiome
139 papers in training set
Top 0.3%
8.3%
3
Nature Communications
4913 papers in training set
Top 23%
8.3%
4
Med
38 papers in training set
Top 0.1%
6.2%
5
Nature Biomedical Engineering
42 papers in training set
Top 0.1%
6.2%
6
Cell Metabolism
49 papers in training set
Top 0.3%
4.3%
7
Nature Aging
51 papers in training set
Top 0.5%
3.6%
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
50% of probability mass above
9
Nature Computational Science
50 papers in training set
Top 0.2%
3.5%
10
Scientific Reports
3102 papers in training set
Top 48%
2.3%
11
eBioMedicine
130 papers in training set
Top 0.8%
2.1%
12
Genome Medicine
154 papers in training set
Top 4%
1.8%
13
Cell Reports Medicine
140 papers in training set
Top 3%
1.8%
14
mSystems
361 papers in training set
Top 5%
1.7%
15
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 33%
1.7%
17
Communications Biology
886 papers in training set
Top 9%
1.7%
18
eLife
5422 papers in training set
Top 44%
1.6%
19
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
20
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
1.2%
21
Nature Biotechnology
147 papers in training set
Top 6%
1.2%
22
The Innovation
12 papers in training set
Top 0.6%
1.2%
23
Genome Biology
555 papers in training set
Top 6%
1.2%
24
Nature Medicine
117 papers in training set
Top 3%
1.2%
25
Nature Microbiology
133 papers in training set
Top 4%
0.9%
26
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.9%
27
Nature Methods
336 papers in training set
Top 6%
0.9%
28
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
29
Nature Genetics
240 papers in training set
Top 7%
0.8%
30
Gut Microbes
70 papers in training set
Top 1%
0.7%