Back

StrataBionn: a neural network supervised classification method for microbial communities

Symons, A. E.; Huynh, A. V.; Cornejo, O. E.

2026-04-02 genomics
10.64898/2026.03.31.715659 bioRxiv
Show abstract

The classification of microbial communities into discrete states or "community state types" (CSTs) is fundamental to understanding host-microbiome interactions and their clinical implications. Traditional methods, such as the nearest-neighbor approaches, often struggle with the inherent noise, high dimensionality, and non-linear signatures of taxonomic profiles. We present a novel supervised framework for microbial community classification, leveraging an Artificial Neural Network (ANN) architecture implemented in a new tool we named StrataBionn. We rigorously evaluated our approach using large-scale vaginal microbiome datasets, directly benchmarking performance against VALENCIA and a Random Forest (RF) classifier. To demonstrate the versatility of our models, we further extended the framework to oral microbiome classification, assessing its stability across diverse anatomical sites. Our supervised models consistently outperformed the nearest-neighbor approach across all evaluated datasets. In the vaginal microbiome, our method achieved an 11.6% to 13.3% increase in performance across all primary metrics, including precision, recall, accuracy, and F1-score. Furthermore, we demonstrate that this performance advantage is maintained in the oral microbiome, highlighting the generalizability of our neural network and ensemble strategies to various microbial ecosystems without the need for niche-specific algorithmic adjustments. By capturing complex feature dependencies that distance-based methods overlook, our approach provides a more robust and accurate census of microbial community structures. StrataBionns ability to learn classification schemes for any microbiome with high accuracy and explainability, through the use of provided utilities to visualize feature-space classification boundaries and perform perturbation analysis on trained classifiers, makes it ideal for broad application in microecology research. This framework offers a scalable, high-performance alternative for microbiome researchers, facilitating more precise clinical stratification and biological insights across hosts body sites.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Microbiome
139 papers in training set
Top 0.1%
22.0%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
9.9%
3
mSystems
361 papers in training set
Top 2%
6.2%
4
Nature Biotechnology
147 papers in training set
Top 2%
4.7%
5
Cell Reports Methods
141 papers in training set
Top 0.6%
4.2%
6
Methods in Ecology and Evolution
160 papers in training set
Top 0.7%
4.1%
50% of probability mass above
7
Microbial Genomics
204 papers in training set
Top 0.5%
3.9%
8
Nature Communications
4913 papers in training set
Top 41%
3.5%
9
Bioinformatics
1061 papers in training set
Top 6%
3.5%
10
Scientific Reports
3102 papers in training set
Top 47%
2.4%
11
PLOS ONE
4510 papers in training set
Top 49%
2.0%
12
npj Biofilms and Microbiomes
56 papers in training set
Top 0.8%
2.0%
13
mSphere
281 papers in training set
Top 3%
2.0%
14
Nucleic Acids Research
1128 papers in training set
Top 10%
1.8%
15
BMC Genomics
328 papers in training set
Top 3%
1.7%
16
Frontiers in Microbiology
375 papers in training set
Top 6%
1.7%
17
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
18
ISME Communications
103 papers in training set
Top 1%
1.6%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
20
Genome Medicine
154 papers in training set
Top 6%
1.3%
21
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
22
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
23
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 6%
0.7%
24
Genome Research
409 papers in training set
Top 5%
0.6%
25
Genome Biology
555 papers in training set
Top 9%
0.6%
26
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%
27
eLife
5422 papers in training set
Top 62%
0.6%