DPGT: A spark based high-performance joint variant calling tool for large cohort sequencing
Gong, C.; Yang, Q.; Wan, R.; Li, S.; Zhang, Y.; Li, Y.
Show abstract
BackgroundJoint variant calling is a crucial step in population-scale sequencing analysis. While population-scale sequencing is a powerful tool for genetic studies, achieving fast and accurate joint variant calling on large cohorts remains computationally challenging. FindingsTo meet this challenge, we developed Distributed Population Genetics Tool (DPGT), an efficient computing framework and a robust tool for joint variant calling on large cohorts based on Apache Spark. DPGT simplifies joint calling tasks for large cohorts with a single command on a local computer or a computing cluster, eliminating the need for users to create complex parallel workflows. We evaluated the performance of DPGT using 2,504 1000 Genomes Project (1KGP), 6 Genome in a Bottle (GIAB) and 9,158 internal whole genome sequencing (WGS) samples together with existing methods. As a result, DPGT produced results comparable in accuracy to existing methods, with less time and better scalability. ConclusionsDPGT is a fast, scalable, and accurate tool for joint variant calling. The source code is available under a GPLv3 license at https://github.com/BGI-flexlab/DPGT, implemented in Java and C++.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.