Back

Resolution of the D4Z4 repeat responsible for facioscapulohumeral muscular dystrophy with HiFi sequencing

Chen, X.; Lemmers, R. J. L. F.; Kronenberg, Z.; Devaney, J. M.; Noya, J.; Berlyoung, A. S.; Yusuff, S.; Lynch, S.; Nykamp, K.; Lyndy, A. S.; Dolzhenko, E.; van der Maarel, S. M.; Eberle, M. A.

2026-04-14 genetics
10.64898/2026.04.10.717730 bioRxiv
Show abstract

The D4Z4 macrosatellite repeat encompasses some of the most difficult-to-resolve disease-related variations in the human genome. D4Z4 has a repeat unit of 3.3 kb (encoding the DUX4 gene) that is present in up to 100 copies on two chromosomes (4 and 10), while DUX4 can only be expressed in somatic cells from the permissive A haplotype that usually occurs on chromosome 4. Facioscapulohumeral muscular dystrophy (FSHD) is caused by chromatin relaxation and ectopic expression of DUX4 in skeletal muscle, mediated by contraction of D4Z4 to 1-10 copies (FSHD1, 95% of FSHD cases) or mutations in chromatin factor genes such as SMCHD1 (FSHD2, 5% of FSHD cases). Due to its large size, disease specific haplotypes and sequence homology between chromosomes, D4Z4 is challenging to resolve by current sequencing technologies. We report a computational tool, Kivvi, to genotype D4Z4 using PacBio whole-genome long-read sequence data. Kivvi detects all D4Z4 alleles in a sample, reporting the repeat size, chromosome (4 vs. 10), distal haplotype (A vs. non-permissive haplotypes) and the methylation level of each allele. We validated Kivvi against gold standard assays for FSHD diagnostics, detecting 100% of contracted alleles and correctly classifying 90% of noncontracted alleles. We showed differential methylation signals between FSHD1 and candidate FSHD2 samples. We profiled D4Z4 across 601 individuals from five ancestral populations, revealing extensive genetic diversity. We identified common haplotypes of D4Z4 alleles and characterized hybrid repeat units, hybrid repeat arrays, and translocation alleles. Combined with HiFi long reads, Kivvi enables the consolidation of multiple FSHD assays into a single workflow and facilitates the discovery of novel genetic modifiers of FSHD through population-scale studies.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.2%
18.1%
2
Nature Communications
4913 papers in training set
Top 7%
18.1%
3
The American Journal of Human Genetics
206 papers in training set
Top 0.5%
9.8%
4
Nature Genetics
240 papers in training set
Top 1%
6.2%
50% of probability mass above
5
Science Translational Medicine
111 papers in training set
Top 0.5%
4.7%
6
EMBO Molecular Medicine
85 papers in training set
Top 0.6%
3.6%
7
Nucleic Acids Research
1128 papers in training set
Top 6%
3.5%
8
Cell Genomics
162 papers in training set
Top 2%
2.5%
9
Genome Biology
555 papers in training set
Top 3%
2.4%
10
Human Molecular Genetics
130 papers in training set
Top 1%
2.4%
11
Genetics in Medicine
69 papers in training set
Top 0.7%
1.6%
12
Nature Medicine
117 papers in training set
Top 2%
1.6%
13
Genome Research
409 papers in training set
Top 2%
1.6%
14
Nature Biotechnology
147 papers in training set
Top 5%
1.4%
15
Scientific Reports
3102 papers in training set
Top 63%
1.4%
16
Nature Methods
336 papers in training set
Top 5%
1.4%
17
Nature
575 papers in training set
Top 12%
1.4%
18
Bioinformatics
1061 papers in training set
Top 8%
1.3%
19
Science Advances
1098 papers in training set
Top 24%
1.1%
20
Communications Biology
886 papers in training set
Top 20%
0.9%
21
Annals of the Rheumatic Diseases
32 papers in training set
Top 0.7%
0.7%
22
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
23
eLife
5422 papers in training set
Top 62%
0.6%
24
Acta Neuropathologica
51 papers in training set
Top 1%
0.6%
25
Human Genetics and Genomics Advances
70 papers in training set
Top 1%
0.6%