First Survey of Publicly Available Metagenomic Sequencing Data Across 24 Middle Eastern and North African Countries: The MENA Microbiome Database
Mathlouthi, N. E. H.; Gdoura-Ben Amor, M.; Belguith, I.; Derouich, R.; Ammar Keskes, L.; Gdoura, R.
Show abstract
Microbiome research has expanded globally, yet the Middle East and North Africa (MENA) region remains severely under-represented in international sequencing repositories. Here we present the MENA Microbiome Database, the first systematically harmonized catalog of publicly available metagenomic sequencing data from 24 MENA countries, consolidating 60,126 runs across 51,365 biological samples and 2,373 BioProjects deposited between 2008 and 2026. Records were retrieved from ENA, NCBI SRA, and PubMed, enriched with BioSample and study-level metadata, and classified into microbiome subtypes using a 73-rule keyword-based harmonization framework. Amplicon sequencing accounted for 80.6% of runs, with Illumina platforms dominating at 92.7%. Geographic coverage is highly skewed: Saudi Arabia and Turkey together contribute over half of all records, while five countries (Libya, Syria, Palestine, Yemen, and South Sudan) remain critically under-sampled. Metadata completeness averaged 73.97% under a MIxS-MIMS proxy framework, with geographic coordinates available for fewer than 15% of runs. Ecological analyses revealed that country-level factors significantly structure environmental, animal-associated, and plant-associated microbiomes, but not human-associated microbiomes. Spatial autocorrelation confirmed non-random clustering of sampling effort around Red Sea coastal and eastern Mediterranean hotspots. This open, reproducible resource, comprising harmonized data files, analysis code, and an interactive browsing platform, establishes a foundational infrastructure for regional microbiome science and equitable global comparative studies. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=133 SRC="FIGDIR/small/722303v1_ufig1.gif" ALT="Figure 1000"> View larger version (69K): org.highwire.dtl.DTLVardef@16ebcd3org.highwire.dtl.DTLVardef@12ed2d1org.highwire.dtl.DTLVardef@112b5b1org.highwire.dtl.DTLVardef@156b8a4_HPS_FORMAT_FIGEXP M_FIG C_FIG
Matching journals
The top 4 journals account for 50% of the predicted probability mass.