Back

The Celiac Microbiome Repository (CMR): A Curated Collection of Celiac Disease Gut Microbiome Sequencing Data

Bishop, H. V.; Prendergast, P. J.; Herbold, C. W.; Ogilvie, O. J.; Dobson, R. C. J.

2026-03-31 bioinformatics
10.64898/2026.03.28.715053 bioRxiv
Show abstract

Celiac disease is an autoimmune condition where the gut microbiome is increasingly recognised as a key environmental factor. While high-throughput sequencing has led to a surge in celiac-related gut microbiome profiling data, these datasets remain fragmented, heterogeneous, and often lack the metadata required for large-scale integration into pooled, cross-cohort datasets. To address this, we developed the Celiac Microbiome Repository (CMR), a curated, open-access collection of celiac-related 16S rRNA gene and shotgun metagenomic sequencing datasets. We employed a systematic curation workflow to identify datasets across the NCBI Sequence Read Archive (SRA) and Scopus, followed by manual metadata extraction and direct author engagement. All 16S data was reprocessed through DADA2 and shotgun data through MetaPhlAn4 to facilitate comparison across studies. The CMR version 1.0 comprises 28 datasets containing 3,245 samples from 13 countries and 5 body sites. Our analysis reveals that while publicly available celiac microbiome samples have accumulated at a rate of approximately 140 per year, significant barriers to accessibility exist. Just 20 of 58 eligible datasets were found to have both raw data and essential metadata readily available within public archives. The repository features a dual-interface design, consisting of a GitHub backend for programmatic access and an R Shiny frontend for interactive data exploration. By providing this curated and harmonised resource, the CMR enables the research community to leverage public data for global meta-analyses and machine learning applications. Ultimately, this work provides the foundation needed to move beyond isolated, small-scale studies toward high-powered discoveries in celiac disease research. Database URLs: https://github.com/CeliacMicrobiomeRepo/celiac-repository | https://celiac.shinyapps.io/celiac-webapp

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 14%
12.4%
2
Scientific Data
174 papers in training set
Top 0.1%
10.2%
3
Nature Biotechnology
147 papers in training set
Top 0.9%
8.5%
4
Microbiome
139 papers in training set
Top 0.5%
6.4%
5
Nucleic Acids Research
1128 papers in training set
Top 4%
4.9%
6
mSystems
361 papers in training set
Top 3%
2.9%
7
Cell Reports Methods
141 papers in training set
Top 1%
2.5%
8
npj Systems Biology and Applications
99 papers in training set
Top 0.8%
2.1%
9
Genome Medicine
154 papers in training set
Top 3%
2.1%
50% of probability mass above
10
Bioinformatics
1061 papers in training set
Top 6%
2.1%
11
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
12
Microbial Genomics
204 papers in training set
Top 0.9%
2.1%
13
Cell Host & Microbe
113 papers in training set
Top 3%
1.9%
14
Cell Genomics
162 papers in training set
Top 3%
1.9%
15
Gut Microbes
70 papers in training set
Top 0.5%
1.9%
16
Cell Systems
167 papers in training set
Top 7%
1.7%
17
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
18
Advanced Science
249 papers in training set
Top 12%
1.5%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
20
PLOS ONE
4510 papers in training set
Top 56%
1.5%
21
Scientific Reports
3102 papers in training set
Top 64%
1.3%
22
Frontiers in Genetics
197 papers in training set
Top 7%
1.2%
23
Cell Reports Medicine
140 papers in training set
Top 6%
1.1%
24
mSphere
281 papers in training set
Top 5%
1.0%
25
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
26
Nature
575 papers in training set
Top 14%
0.9%
27
Patterns
70 papers in training set
Top 2%
0.8%
28
Genome Biology
555 papers in training set
Top 7%
0.8%
29
eLife
5422 papers in training set
Top 55%
0.8%
30
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 6%
0.8%