Back

Benchmarking sequence performance on the DNBSEQ-T7 using Genome in a Bottle reference genomes

van Coller, A.; Taukobong, S.; Malima, M.; Ghoor, S.; Nangammbi, N.; Roode, E.; Naicker, M.; Cole, V.; Glanzmann, B.; Kinnear, C.; Carstens, N.

2026-05-26 bioinformatics
10.64898/2026.05.22.727100 bioRxiv
Show abstract

Advances in sequencing technologies have improved the accuracy, throughput, and completeness of human genome characterization, enabling more reliable detection of genetic variation. Well-characterized reference genomes are critical for benchmarking sequencing platforms and bioinformatics analysis pipelines. Here, we present whole genome sequencing datasets generated for the Ashkenazi Jewish trio reference samples from the Genome in a Bottle Consortium. Libraries were prepared using three distinct MGI-based workflows: PCR-free library preparation, FastFS DNA library preparation, and Universal DNA library preparation. Sequencing was performed on the MGI DNBSEQ-T7 platform, generating a minimum of 400 million paired-end reads per sample, corresponding to 30X mean genome coverage. Raw reads were processed using a standardized GATK bioinformatics workflow. Sequencing performance and variant detection accuracy were evaluated using the Genome in a Bottle high-confidence benchmark variant sets. All workflows demonstrated high sequencing quality and concordance with GIAB benchmark truth sets, with PCR-free libraries showing the strongest indel calling performance and lowest Mendelian violation rates across the Ashkenazi trio. This dataset provides a resource for benchmarking DNBSEQ-T7 sequencing and bioinformatics workflows, and for evaluating the impact of library preparation strategies on whole genome variant detection performance.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.4%
10.5%
2
Nucleic Acids Research
1128 papers in training set
Top 2%
8.5%
3
Scientific Reports
3102 papers in training set
Top 17%
6.4%
4
Genome Biology
555 papers in training set
Top 1%
6.4%
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.9%
6
Nature Communications
4913 papers in training set
Top 33%
4.9%
7
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.5%
4.0%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.1%
50% of probability mass above
10
Frontiers in Genetics
197 papers in training set
Top 2%
3.1%
11
PLOS ONE
4510 papers in training set
Top 43%
2.8%
12
Scientific Data
174 papers in training set
Top 0.7%
2.6%
13
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
2.4%
14
GigaScience
172 papers in training set
Top 0.9%
2.1%
15
Database
51 papers in training set
Top 0.3%
2.1%
16
Cell Genomics
162 papers in training set
Top 2%
2.1%
17
Bioinformatics
1061 papers in training set
Top 6%
2.1%
18
Communications Biology
886 papers in training set
Top 7%
1.8%
19
BMC Genomics
328 papers in training set
Top 2%
1.7%
20
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
21
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.5%
22
Human Genomics
21 papers in training set
Top 0.2%
1.3%
23
International Journal of Molecular Sciences
453 papers in training set
Top 11%
1.0%
24
Nature Biotechnology
147 papers in training set
Top 6%
1.0%
25
Alzheimer's & Dementia
143 papers in training set
Top 2%
0.9%
26
PeerJ
261 papers in training set
Top 13%
0.8%
27
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
28
Viruses
318 papers in training set
Top 5%
0.8%
29
Frontiers in Bioinformatics
45 papers in training set
Top 0.7%
0.8%
30
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.8%