Back

MucOneUp: A Simulation Framework for MUC1-VNTR Variant Benchmarking

Popp, B.; Saei, H.

2026-05-12 bioinformatics
10.64898/2026.05.08.723876 bioRxiv
Show abstract

SummaryVariable number tandem repeats (VNTRs) in the MUC1 gene cause autosomal dominant tubulointerstitial kidney disease when disrupted by frameshift variants, but the GC-rich 60-bp repeat structure (20-125 copies) challenges variant detection. While tools like VNtyper enable MUC1 variant calling, no gold-standard benchmarking datasets exist for systematic performance evaluation. We present MucOneUp, a specialized simulation framework for generating MUC1-VNTR reference sequences with targeted variants and platform-specific sequencing reads (Illumina, Oxford Nanopore, PacBio). MucOneUp employs Markov chain-based repeat generation, supports diploid simulation with customizable variant placement, and includes additional analysis modules for SNaPshot assay simulation and exploratory frameshift analysis. We validate MucOneUp through a multi-variant, cross-platform benchmark of six tool-platform combinations using 13 distinct frameshift variants and investigate VNTR length effects on detection. Availability and implementationMucOneUp is accessible at no cost under the MIT License at https://github.com/berntpopp/MucOneUp and archived on Zenodo (DOI: 10.5281/zenodo.19740406). Contactbernt.popp@charite.de Supplementary informationSupplementary data are provided with this manuscript.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.1%
59.5%
50% of probability mass above
2
Nature Communications
4913 papers in training set
Top 24%
8.2%
3
Genome Medicine
154 papers in training set
Top 1%
4.9%
4
Genome Biology
555 papers in training set
Top 2%
4.3%
5
Nucleic Acids Research
1128 papers in training set
Top 5%
3.7%
6
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
7
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
8
Nature Methods
336 papers in training set
Top 5%
1.2%
9
Genome Research
409 papers in training set
Top 3%
1.2%
10
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
11
Nature Biotechnology
147 papers in training set
Top 7%
0.9%
12
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
13
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
14
BMC Medical Genomics
36 papers in training set
Top 1%
0.7%
15
Cell Genomics
162 papers in training set
Top 7%
0.7%
16
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
17
Science
429 papers in training set
Top 22%
0.6%
18
Nature Computational Science
50 papers in training set
Top 2%
0.6%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%
20
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.6%
21
npj Genomic Medicine
33 papers in training set
Top 1%
0.6%
22
JCO Clinical Cancer Informatics
18 papers in training set
Top 1%
0.6%
23
Scientific Reports
3102 papers in training set
Top 80%
0.5%