Back

gSV: a general structural variant detector using the third-generation sequencing data

HAO, J.; Shi, J.; Lian, S.; Zhang, Z.; Luo, Y.; Hu, T.; Ishibashi, T.; Wang, D.; Wang, S.; Fan, X.; Yu, W.

2026-03-04 bioinformatics
10.64898/2026.03.02.703663 bioRxiv
Show abstract

Structural variants (SVs) are major contributors to genome diversity and disease susceptibility, particularly in cancer. Although third-generation sequencing technologies have substantially improved SV detection sensitivity, accurate detection of complex SVs remains challenging due to fragmented and heterogeneous alignment signals, as well as the dependence of many existing methods on predefined variant models. In this paper, we propose gSV, a general SV detector that integrates alignment-based and assembly-based approaches with the maximum exact match (MEM) strategy, with particular emphasis on resolving SVs with complex or atypical alignment signatures. Without predefined assumptions about SV types, gSV captures diverse variant signals, enabling the detection of SVs that are usually missed by conventional tools. Benchmarking using both simulated datasets and real long-read sequencing data demonstrates that gSV achieves improved sensitivity and overall detection performance compared with current state-of-the-art SV callers, particularly for simple and complex SV events with complex alignment patterns. Unique SV discoveries in four breast cancer cell lines, particularly in cancer-associated genes, demonstrate the potential biological relevance of gSV-enabled discoveries. Furthermore, analysis of a breast cancer cohort from the Chinese population highlights the utility of gSV for population-scale genomic studies. Collectively, gSV provides a unified framework for comprehensive SV discovery in both research and clinical genomics settings. O_TEXTBOXKey PointsO_LIExisting structural variant (SV) detection tools are limited in resolving SVs with complex alignment patterns due to their reliance on predefined variant models. C_LIO_LIgSV integrates alignment-based and assembly-based evidence using a maximum exact match (MEM) strategy, enabling capture of diverse and complex SV signals. C_LIO_LIBenchmarking on simulated and real long-read sequencing datasets demonstrates that gSV achieves competitive performance on canonical SV classes and improved sensitivity for complex SV patterns. C_LIO_LIApplication of gSV to breast cancer cell lines and a population-scale breast cancer cohort reveals previously unresolved SVs in cancer-associated genes, highlighting its utility in genomic and clinical studies. C_LI C_TEXTBOX

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.4%
39.8%
2
Bioinformatics Advances
184 papers in training set
Top 0.6%
4.9%
3
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.9%
4
Genome Biology
555 papers in training set
Top 2%
4.0%
50% of probability mass above
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
6
Genome Medicine
154 papers in training set
Top 2%
3.6%
7
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
8
Cell Systems
167 papers in training set
Top 4%
3.6%
9
Genome Research
409 papers in training set
Top 1%
3.1%
10
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.1%
11
Nature Communications
4913 papers in training set
Top 48%
1.9%
12
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.2%
1.7%
13
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
14
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
15
PLOS ONE
4510 papers in training set
Top 61%
1.1%
16
iScience
1063 papers in training set
Top 26%
0.9%
17
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
18
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
19
Scientific Reports
3102 papers in training set
Top 73%
0.8%
20
Advanced Science
249 papers in training set
Top 17%
0.8%
21
Nature Methods
336 papers in training set
Top 6%
0.8%
22
Frontiers in Genetics
197 papers in training set
Top 10%
0.8%
23
Cell Genomics
162 papers in training set
Top 6%
0.8%
24
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
25
BioData Mining
15 papers in training set
Top 1%
0.7%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.5%
27
Nature Computational Science
50 papers in training set
Top 2%
0.5%
28
BMC Genomics
328 papers in training set
Top 8%
0.5%