Back

eBiota: Designing microbial communities from large seed pools with desired function using rapid optimization and deep learning

Jiang, X.; Hou, J.; Zhang, H.; Guo, J.; Gu, S.; Vandeputte, D.; Liao, Y.; Guo, Q.; Yang, X.; Zhou, Y.; Geng, P. X.; Wang, C.; Li, M.; Jousset, A.; Shen, X.; Wei, Z.; Zhu, H.

2026-03-31 bioengineering
10.64898/2026.03.29.714676 bioRxiv
Show abstract

Designing microbial communities to generate target products is crucial for biotechnology, agriculture, and disease treatment. However, rationally designing such communities from large seed pools has become a major challenge, as the rapidly expanding number of complete microbial genomes greatly expands the search space and sharply increases the required screening time and computational cost. Here, we introduce eBiota, a platform for ab initio design of microbial communities from a pool of 21,514 strains to generate target products. eBiota not only identifies optimal strain combinations but also simulates community behaviors, including microbial interactions and relative abundances. eBiota integrates three modules: CoreBFS, a graph-based search algorithm that rapidly screens for bacteria with complete metabolic pathways related to the target product; ProdFBA, an extended flux balance analysis that identifies microbial consortia with maximal production efficiency; and DeepCooc, a deep learning model trained on 23,323 microbiome samples across various environments to infer co-occurrence patterns. We validated eBiotas capabilities in microbial community design and production efficiency calculation using public microbiome datasets, ranging from single strains to six-member consortia. Further in vitro experiments involving 94 strains confirmed eBiotas ability to identify species that inhibit pathogen growth and to accurately model the relative abundances within complex microbial communities. As an initial digital twin, eBiota provides a powerful platform for the rational design of functional microbial communities, offering new opportunities for metabolic engineering and synthetic biology.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
28.4%
2
Nature Communications
4913 papers in training set
Top 9%
15.1%
3
Nature Biotechnology
147 papers in training set
Top 0.6%
10.4%
50% of probability mass above
4
Nature Methods
336 papers in training set
Top 2%
6.5%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.7%
6
Advanced Science
249 papers in training set
Top 6%
3.1%
7
Nature Biomedical Engineering
42 papers in training set
Top 0.4%
2.5%
8
Science
429 papers in training set
Top 12%
2.1%
9
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
10
Science China Life Sciences
26 papers in training set
Top 0.8%
1.8%
11
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
12
ACS Synthetic Biology
256 papers in training set
Top 2%
1.7%
13
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
14
mSystems
361 papers in training set
Top 6%
1.0%
15
Nature
575 papers in training set
Top 13%
1.0%
16
Metabolic Engineering
68 papers in training set
Top 0.6%
1.0%
17
Angewandte Chemie International Edition
81 papers in training set
Top 3%
0.9%
18
Nature Microbiology
133 papers in training set
Top 4%
0.9%
19
Nature Chemical Biology
104 papers in training set
Top 3%
0.8%
20
Cell Reports
1338 papers in training set
Top 32%
0.8%
21
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
22
eLife
5422 papers in training set
Top 58%
0.7%
23
Med
38 papers in training set
Top 0.9%
0.7%
24
Genome Biology
555 papers in training set
Top 8%
0.7%
25
Science Advances
1098 papers in training set
Top 32%
0.7%
26
Cell Reports Medicine
140 papers in training set
Top 10%
0.5%
27
Cell Genomics
162 papers in training set
Top 8%
0.5%
28
Nature Medicine
117 papers in training set
Top 6%
0.5%