Back

Quantitative Optimization of Sensitivity and Specificity in Targeted and Whole-Exome Sequencing Using Reference-Standard DNA Mixtures

Moon, Y.-B.; Hong, C. H.; Kim, J.-K.; Kang, E.-K.; Choi, H. W.; Hwang, D.-W.; Ko, J.-H.; Kim, H.-S.; Lee, D.-e.; Park, S.-y.; Wang, C. C.; Kim, Y.-H.; Kim, T.; Heo, S. G.; Han, N.; Hong, K.-M.

2026-01-27 bioinformatics
10.64898/2026.01.25.701479 bioRxiv
Show abstract

BackgroundWe previously developed a benchmarking strategy using mixtures of homozygote and heterozygote DNAs as reference standards to simultaneously assess sensitivity and false positive (FP) error rates in targeted next-generation sequencing (T-NGS) and whole-exome sequencing (WES), revealing substantial variability across commercial platforms. However, optimal analytic conditions for clinical application remain undefined. MethodsWe systematically evaluated multiple sequencing kits and bioinformatics pipelines across various variant allele fraction (VAF) thresholds to identify conditions that maximize both sensitivity and specificity. Recurrent error-prone alleles were defined and filtered to enhance specificity. ResultsOptimal performance was achieved using the DRAGEN pipeline with recurrent FP allele filtering. For T-NGS, a 1% VAF cutoff yielded a 95% detection threshold of 2.99% and 1.21 FPs per megabase (FP/Mb); for WES, a 2% cutoff yielded a 95% threshold of 5.02% and 1.15 FP/Mb. These settings improved sensitivity >3-fold and reduced FP rates >96% versus suboptimal pipelines. Notably, VAF thresholds flattened sensitivity differences across platforms, obscuring key performance disparities--challenging assumptions that T-NGS is inherently more sensitive than WES. In-house and conventional pipelines undercalled up to 10% of true variants. Restricting reporting of 1-4% VAF variants to [~]1,000 predefined actionable sites enabled recovery of clinically relevant mutations while reducing FP risk >99%. ConclusionsThis study provides a quantitative framework for optimizing NGS performance. Our findings support actionable strategies to improve diagnostic accuracy in clinical genomics through tailored pipeline selection, VAF thresholding, and artifact filtering.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.3%
18.6%
2
Genome Medicine
154 papers in training set
Top 0.7%
7.2%
3
Scientific Reports
3102 papers in training set
Top 18%
6.4%
4
Bioinformatics
1061 papers in training set
Top 4%
6.3%
5
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.1%
4.3%
6
PLOS ONE
4510 papers in training set
Top 36%
4.0%
7
BMC Genomics
328 papers in training set
Top 0.6%
4.0%
50% of probability mass above
8
Genetics in Medicine
69 papers in training set
Top 0.4%
3.6%
9
Clinical Chemistry
22 papers in training set
Top 0.1%
3.6%
10
Journal of Clinical Microbiology
120 papers in training set
Top 0.7%
3.1%
11
Nature Communications
4913 papers in training set
Top 44%
2.6%
12
The American Journal of Human Genetics
206 papers in training set
Top 2%
2.4%
13
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
14
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
15
BioData Mining
15 papers in training set
Top 0.4%
1.5%
16
Clinical Infectious Diseases
231 papers in training set
Top 3%
1.3%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
19
Briefings in Bioinformatics
326 papers in training set
Top 5%
0.9%
20
The CRISPR Journal
33 papers in training set
Top 0.2%
0.9%
21
Alzheimer's & Dementia
143 papers in training set
Top 2%
0.9%
22
npj Genomic Medicine
33 papers in training set
Top 0.7%
0.9%
23
Communications Biology
886 papers in training set
Top 24%
0.7%
24
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%
25
BMC Medical Genomics
36 papers in training set
Top 1%
0.7%
26
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.7%
27
Molecular Therapy - Nucleic Acids
24 papers in training set
Top 0.5%
0.6%