Back

FungiGutDB: Curated database for taxonomic assignment of the gut mycobiome by whole genome sequencing

Coleto-Checa, D.; Lacruz-Pleguezuelos, B.; Perez Cuervo, A.; Cardenas-Roig, N.; Carrasco-Guijarro, L.; Martin-Segura, A.; Carrillo de Santa Pau, E.; Marcos-Zambrano, L. J.

2025-12-06 bioinformatics
10.64898/2025.12.03.691829 bioRxiv
Show abstract

Fungi represent less than 1% of the gut microbiota; However, their importance in host homeostasis and disease is increasingly recognized. Accurate characterization of the gut mycobiome from metagenomic data remains a significant challenge due to the low abundance of fungal DNA, the performance of bacteria-oriented classifiers, and the limited availability of curated fungal reference databases. To overcome these issues, we developed FungiGutDB v1.0, a curated database containing 304 taxa previously identified in culture-dependent human studies, and we integrated the database in a reproducible workflow to ease its application (FungiGut). Benchmarking analyses demonstrated that FungiGut achieved a substantially lower false positive rate in mock communities compared to standard non-gut-specific fungi databases. When applied to real metagenomic datasets, FungiGut successfully characterized the gut mycobiome, identifying Saccharomyces cerevisiae as the predominant species in healthy individuals, along with common dietary fungi found in fermented dairy products (Penicillium camemberti, Debaryomyces hansenii, Kluyveromyces lactis, Pichia kudriavzevii). In contrast, samples from patients with non-responsive celiac disease showed a higher relative abundance of opportunistic pathogens and a lower number of diet-associated taxa, suggesting a trend toward a dysbiotic mycobiome profile. By limiting classification to fungal species previously isolated from the human gut, FungiGut minimizes misclassifications derived from environmental or plant-associated taxa, which often lead to mistaken interpretation of the results. Overall, FungiGut offers a biologically consistent and reproducible approach to gut mycobiome profiling, improving taxonomic accuracy and strengthening confidence in the interpretation of fungal metagenomic data in human microbiome research.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Microbiome
139 papers in training set
Top 0.1%
28.3%
2
mSystems
361 papers in training set
Top 1%
7.3%
3
Nature Communications
4913 papers in training set
Top 27%
6.5%
4
mSphere
281 papers in training set
Top 0.8%
5.0%
5
Cell Reports Methods
141 papers in training set
Top 0.8%
3.7%
50% of probability mass above
6
Gut Microbes
70 papers in training set
Top 0.3%
3.7%
7
Genome Medicine
154 papers in training set
Top 2%
3.7%
8
Microbial Genomics
204 papers in training set
Top 0.7%
3.1%
9
Nucleic Acids Research
1128 papers in training set
Top 7%
2.8%
10
Nature Biotechnology
147 papers in training set
Top 3%
2.7%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
12
Scientific Reports
3102 papers in training set
Top 52%
1.9%
13
npj Biofilms and Microbiomes
56 papers in training set
Top 0.8%
1.9%
14
Genome Biology
555 papers in training set
Top 4%
1.7%
15
Bioinformatics
1061 papers in training set
Top 7%
1.7%
16
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
17
Microbiology Spectrum
435 papers in training set
Top 3%
1.4%
18
Communications Biology
886 papers in training set
Top 12%
1.4%
19
PLOS ONE
4510 papers in training set
Top 60%
1.3%
20
Advanced Science
249 papers in training set
Top 15%
1.1%
21
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 5%
0.9%
22
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
23
Cell Host & Microbe
113 papers in training set
Top 5%
0.8%
24
Frontiers in Microbiology
375 papers in training set
Top 9%
0.8%
25
BMC Microbiology
35 papers in training set
Top 2%
0.7%
26
BMC Bioinformatics
383 papers in training set
Top 8%
0.7%
27
Microorganisms
101 papers in training set
Top 3%
0.5%
28
BMC Genomics
328 papers in training set
Top 7%
0.5%
29
mBio
750 papers in training set
Top 13%
0.5%
30
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.5%