Back

EGP1K: Whole-Genome Sequencing of 1,024 Egyptians Characterizes Population Structure and Genetic Diversity

Amer, K.; Moustafa, A.; Hassan, W. A.; Adel, E.; AbdElaal, K. R.; Ghanim, T. A.; Abd El-Raouf, A.; El-Hosseiny, A.; El-Sayed, A. F.; Badr, A. H.; Hassan, A.; Kotb, A.; Ragheb, A.; Muhammad, A. M.; Ali, A.; Abdelaal, A.; Ramadan, E.; El-Garhy, F. M.; El Shehaby, H.; Ali, M. A.; Albarbary, M.; Zahra, M. A.; Amer, M.; Elmonem, M. A.; Fahmy, N. T.; Abdel-Haseeb, O. M.; Hassan, T. M.; Daoud, Y. A.; Howeedy, Y.; Farouk, Y. K.; Soror, S.; El-Feky, G.; Sakr, M.; Soliman, N. A.; Gad, Y. Z.; Abdel-Ghaffar, K. A.; Egypt Genome Consortium,

2026-04-06 genomics
10.64898/2026.04.02.715521 bioRxiv
Show abstract

Middle Eastern and North African populations remain underrepresented in genomic databases, comprising less than 1% of genome-wide association study participants despite representing approximately 6% of the global population. Here we present the Egypt Genome Project (EGP1K), in which we performed whole-genome sequencing on 1,024 unrelated Egyptian individuals originating from 21 of Egypts 27 governorates, recruited through eight clinical and research centers across Upper and Lower Egypt. We identified over 51.3 million variants, of which 17.1 million (33.4%) were absent from dbSNP. Allele frequency comparisons across 6.5 million shared variants showed the strongest concordance with Middle Eastern populations ({tau} = 0.977). Principal component analysis and ADMIXTURE modeling at K = 7 revealed that Egyptians share a dominant ancestry component (71.8%) with Middle Eastern populations and carry a smaller Egyptian-enriched component (18.5%) that distinguishes them from neighboring groups. Runs of homozygosity varied substantially across subregions, with Upper Egypt showing the highest burden, paralleling elevated consanguinity rates. Carrier frequency analysis identified MEFV (Familial Mediterranean Fever) at 9.1% as the most prevalent pathogenic carrier state; when adjusted for the national consanguinity rate, MEFV carrier status alone projects approximately 6,600 affected births per year. HLA class I typing identified allele frequencies placing Egyptians within the Levantine-Eastern Mediterranean cluster, providing baseline immunogenetic data currently absent from international databases. Analysis of polygenic risk score distributions revealed substantial differences in threshold-based risk stratification between Egyptians and European reference populations. When the Europeanderived 90th percentile threshold was applied, 83.3% of Egyptians were assigned to high-risk strata for stroke, 76.4% for chronic kidney disease, and 72.8% for gout, compared to the intended 10% high-risk proportion. These distributional shifts were observed across several cardiometabolic traits (Cohens d = 1.55-1.61), while other traits showed closer cross-population concordance, indicating that the degree of threshold miscalibration varies by trait. Together, these findings establish EGP1K as a genomic reference for Egypt and indicate that European-derived risk stratification thresholds may not be directly transferable to the Egyptian population, supporting the need for population-specific calibration of polygenic risk scores.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 5%
18.9%
2
Science Translational Medicine
111 papers in training set
Top 0.2%
6.9%
3
Frontiers in Genetics
197 papers in training set
Top 0.8%
6.5%
4
Genome Medicine
154 papers in training set
Top 1%
4.9%
5
Scientific Reports
3102 papers in training set
Top 33%
3.7%
6
Atherosclerosis
29 papers in training set
Top 0.4%
3.6%
7
The American Journal of Human Genetics
206 papers in training set
Top 1%
3.6%
8
Circulation
66 papers in training set
Top 0.9%
3.6%
50% of probability mass above
9
PLOS Medicine
98 papers in training set
Top 2%
2.1%
10
Science
429 papers in training set
Top 12%
2.1%
11
Cell Genomics
162 papers in training set
Top 3%
1.9%
12
Human Molecular Genetics
130 papers in training set
Top 1%
1.9%
13
Kidney International
25 papers in training set
Top 0.2%
1.7%
14
eLife
5422 papers in training set
Top 41%
1.7%
15
Nature Medicine
117 papers in training set
Top 2%
1.7%
16
PLOS ONE
4510 papers in training set
Top 53%
1.7%
17
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
18
Cell
370 papers in training set
Top 11%
1.7%
19
Journal of the American Heart Association
119 papers in training set
Top 3%
1.7%
20
Blood
67 papers in training set
Top 0.9%
1.5%
21
International Journal of Epidemiology
74 papers in training set
Top 2%
1.1%
22
Circulation Research
39 papers in training set
Top 0.8%
1.0%
23
Human Genomics
21 papers in training set
Top 0.3%
0.9%
24
Transplantation
13 papers in training set
Top 0.3%
0.9%
25
Cell Reports
1338 papers in training set
Top 30%
0.9%
26
Nature Cardiovascular Research
28 papers in training set
Top 0.5%
0.9%
27
Journal of Clinical Investigation
164 papers in training set
Top 6%
0.8%
28
PLOS Genetics
756 papers in training set
Top 14%
0.8%
29
EMBO Molecular Medicine
85 papers in training set
Top 4%
0.8%
30
Brain
154 papers in training set
Top 5%
0.7%