Back

CCycDB: an integrative knowledgebase to fingerprint microbially mediated carbon cycling processes

Zhou, J.; Qian, L.; Ji, M.; Ma, K.; Yu, X.; Chen, J.; Lin, L.; Gong, X.; He, Z.; Wang, J.; Tu, Q.

2026-01-28 microbiology
10.64898/2026.01.28.702190 bioRxiv
Show abstract

Microorganisms play essential roles in mediating biogeochemical cycling of carbon across Earths ecosystems. Understanding the processes and underlying mechanisms for microbially mediated carbon cycling is therefore critical for advancing global ecology and climate change research. To comprehensively depict these complex biogeochemical processes, we developed CCycDB, a knowledge-based functional gene database, to accurately fingerprint microbially-mediated carbon cycling pathways and gene families, particularly from shotgun metagenomes. The CCycDB database comprises 4,676 gene families classified into six major functional categories, further structured into 45 level-1 and 188 level-2 sub-categories, encompassing a total of 10,991,724 high-quality reference sequences. Validation using both synthetic and real-world datasets demonstrated that CCycDB outperforms existing orthology databases in terms of accuracy, coverage and specificity. By directly targeting carbon-cycling functional gene families, CCycDB provided promising routines to reconstruct both functional gene and taxonomic profiles associated with microbially mediated carbon cycling. Application of CCycDB to shotgun metagenomes from diverse and complex ecosystems revealed pronounced habitat-specific differences in carbon cycling processes and their associated microbial taxa. Collectively, CCycDB provides a powerful and reliable tool for profiling carbon cycling processes from both functional and taxonomic perspectives in complex ecosystems. CCycDB is accessible at https://ccycdb.github.io/. Impact StatementThe microbially mediated carbon cycling processes are the most complex biogeochemical processes in the Earths biosphere, playing profound regulatory roles on global climate changes. A key bottleneck in linking microbial communities to global change is the lack of integrated tools for comprehensive carbon cycle profiling. Here, we present CCycDB, a tool that serves a dual purpose--first being a reference database that obtains functional gene and taxonomic profiles and functioning as a customized routine for efficiently aligning sequences and querying associated functional information. CCycDB enables researchers to accurately link microbial community dynamics to carbon cycling and transforming pathways, thereby advancing integrated global change studies with microbes and ecological research via complex metagenomic datasets.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
mSystems
361 papers in training set
Top 0.1%
26.6%
2
Microbiome
139 papers in training set
Top 0.2%
10.4%
3
ISME Communications
103 papers in training set
Top 0.1%
8.7%
4
mBio
750 papers in training set
Top 3%
6.5%
50% of probability mass above
5
Genome Biology
555 papers in training set
Top 3%
2.7%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 27%
2.2%
7
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
8
mSphere
281 papers in training set
Top 3%
1.9%
9
Nature Communications
4913 papers in training set
Top 48%
1.9%
10
GigaScience
172 papers in training set
Top 1%
1.8%
11
iScience
1063 papers in training set
Top 13%
1.7%
12
Bioinformatics
1061 papers in training set
Top 7%
1.7%
13
eLife
5422 papers in training set
Top 40%
1.7%
14
Methods in Ecology and Evolution
160 papers in training set
Top 1%
1.5%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.4%
16
Frontiers in Microbiology
375 papers in training set
Top 6%
1.4%
17
PLOS ONE
4510 papers in training set
Top 58%
1.4%
18
PLOS Computational Biology
1633 papers in training set
Top 20%
1.1%
19
Metabolites
50 papers in training set
Top 0.8%
1.0%
20
Nucleic Acids Research
1128 papers in training set
Top 14%
1.0%
21
Cell Systems
167 papers in training set
Top 10%
1.0%
22
Environmental Microbiome
26 papers in training set
Top 0.4%
0.9%
23
Genome Medicine
154 papers in training set
Top 7%
0.8%
24
Microbiology Resource Announcements
22 papers in training set
Top 0.7%
0.8%
25
Communications Earth & Environment
14 papers in training set
Top 0.8%
0.8%
26
Nature Microbiology
133 papers in training set
Top 4%
0.8%
27
Genome Research
409 papers in training set
Top 4%
0.8%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
29
Scientific Data
174 papers in training set
Top 2%
0.7%
30
The ISME Journal
194 papers in training set
Top 3%
0.7%