gSV: a general structural variant detector using the third-generation sequencing data
HAO, J.; Shi, J.; Lian, S.; Zhang, Z.; Luo, Y.; Hu, T.; Ishibashi, T.; Wang, D.; Wang, S.; Fan, X.; Yu, W.
Show abstract
Structural variants (SVs) are major contributors to genome diversity and disease susceptibility, particularly in cancer. Although third-generation sequencing technologies have substantially improved SV detection sensitivity, accurate detection of complex SVs remains challenging due to fragmented and heterogeneous alignment signals, as well as the dependence of many existing methods on predefined variant models. In this paper, we propose gSV, a general SV detector that integrates alignment-based and assembly-based approaches with the maximum exact match (MEM) strategy, with particular emphasis on resolving SVs with complex or atypical alignment signatures. Without predefined assumptions about SV types, gSV captures diverse variant signals, enabling the detection of SVs that are usually missed by conventional tools. Benchmarking using both simulated datasets and real long-read sequencing data demonstrates that gSV achieves improved sensitivity and overall detection performance compared with current state-of-the-art SV callers, particularly for simple and complex SV events with complex alignment patterns. Unique SV discoveries in four breast cancer cell lines, particularly in cancer-associated genes, demonstrate the potential biological relevance of gSV-enabled discoveries. Furthermore, analysis of a breast cancer cohort from the Chinese population highlights the utility of gSV for population-scale genomic studies. Collectively, gSV provides a unified framework for comprehensive SV discovery in both research and clinical genomics settings. O_TEXTBOXKey PointsO_LIExisting structural variant (SV) detection tools are limited in resolving SVs with complex alignment patterns due to their reliance on predefined variant models. C_LIO_LIgSV integrates alignment-based and assembly-based evidence using a maximum exact match (MEM) strategy, enabling capture of diverse and complex SV signals. C_LIO_LIBenchmarking on simulated and real long-read sequencing datasets demonstrates that gSV achieves competitive performance on canonical SV classes and improved sensitivity for complex SV patterns. C_LIO_LIApplication of gSV to breast cancer cell lines and a population-scale breast cancer cohort reveals previously unresolved SVs in cancer-associated genes, highlighting its utility in genomic and clinical studies. C_LI C_TEXTBOX
Matching journals
The top 4 journals account for 50% of the predicted probability mass.