Back

First Survey of Publicly Available Metagenomic Sequencing Data Across 24 Middle Eastern and North African Countries: The MENA Microbiome Database

Mathlouthi, N. E. H.; Gdoura-Ben Amor, M.; Belguith, I.; Derouich, R.; Ammar Keskes, L.; Gdoura, R.

2026-05-06 bioinformatics
10.64898/2026.05.01.722303 bioRxiv
Show abstract

Microbiome research has expanded globally, yet the Middle East and North Africa (MENA) region remains severely under-represented in international sequencing repositories. Here we present the MENA Microbiome Database, the first systematically harmonized catalog of publicly available metagenomic sequencing data from 24 MENA countries, consolidating 60,126 runs across 51,365 biological samples and 2,373 BioProjects deposited between 2008 and 2026. Records were retrieved from ENA, NCBI SRA, and PubMed, enriched with BioSample and study-level metadata, and classified into microbiome subtypes using a 73-rule keyword-based harmonization framework. Amplicon sequencing accounted for 80.6% of runs, with Illumina platforms dominating at 92.7%. Geographic coverage is highly skewed: Saudi Arabia and Turkey together contribute over half of all records, while five countries (Libya, Syria, Palestine, Yemen, and South Sudan) remain critically under-sampled. Metadata completeness averaged 73.97% under a MIxS-MIMS proxy framework, with geographic coordinates available for fewer than 15% of runs. Ecological analyses revealed that country-level factors significantly structure environmental, animal-associated, and plant-associated microbiomes, but not human-associated microbiomes. Spatial autocorrelation confirmed non-random clustering of sampling effort around Red Sea coastal and eastern Mediterranean hotspots. This open, reproducible resource, comprising harmonized data files, analysis code, and an interactive browsing platform, establishes a foundational infrastructure for regional microbiome science and equitable global comparative studies. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=133 SRC="FIGDIR/small/722303v1_ufig1.gif" ALT="Figure 1000"> View larger version (69K): org.highwire.dtl.DTLVardef@16ebcd3org.highwire.dtl.DTLVardef@12ed2d1org.highwire.dtl.DTLVardef@112b5b1org.highwire.dtl.DTLVardef@156b8a4_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 2%
23.3%
2
Microbiome
139 papers in training set
Top 0.1%
12.9%
3
Scientific Data
174 papers in training set
Top 0.1%
12.8%
4
Nucleic Acids Research
1128 papers in training set
Top 5%
3.7%
50% of probability mass above
5
Nature Biotechnology
147 papers in training set
Top 4%
2.4%
6
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.2%
7
Microbial Genomics
204 papers in training set
Top 0.9%
2.2%
8
GigaScience
172 papers in training set
Top 1%
2.0%
9
Genome Biology
555 papers in training set
Top 4%
2.0%
10
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.8%
11
eLife
5422 papers in training set
Top 39%
1.8%
12
Nature Microbiology
133 papers in training set
Top 3%
1.5%
13
mSystems
361 papers in training set
Top 5%
1.5%
14
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
15
Scientific Reports
3102 papers in training set
Top 61%
1.5%
16
Bioinformatics
1061 papers in training set
Top 8%
1.4%
17
PLOS ONE
4510 papers in training set
Top 57%
1.4%
18
Genome Medicine
154 papers in training set
Top 6%
1.3%
19
Advanced Science
249 papers in training set
Top 15%
1.1%
20
mSphere
281 papers in training set
Top 5%
0.9%
21
Cell Reports Methods
141 papers in training set
Top 4%
0.8%
22
Communications Biology
886 papers in training set
Top 20%
0.8%
23
Gut Microbes
70 papers in training set
Top 1.0%
0.8%
24
Cell Reports
1338 papers in training set
Top 35%
0.7%
25
Frontiers in Microbiology
375 papers in training set
Top 10%
0.7%
26
Cell
370 papers in training set
Top 18%
0.7%
27
Environmental Microbiome
26 papers in training set
Top 0.7%
0.5%
28
Nature
575 papers in training set
Top 18%
0.5%
29
Database
51 papers in training set
Top 1%
0.5%