Back

Vision-Based Genomic Model for Copy Number Variant Pathogenicity Prediction

Buralkin, I.; Botas, J.; Chang, K.-L.; Deng, Y.; Papastathopoulos-Katsaros, A.; Liu, Z.; Park, J.

2026-05-26 bioinformatics
10.64898/2026.05.21.726953 bioRxiv
Show abstract

Copy number variants (CNVs) are a major class of structural genomic alterations underlying rare disease, including neurodevelopmental delay and intellectual disability, yet predicting their pathogenicity remains challenging. Existing methods reduce CNVs to region-level numerical features, discarding the positional structure and cross-track patterns that expert clinical reviewers use to interpret genomic evidence. To address this, we introduce TO_SCPLOWESSERACTC_SCPLOW for CNV, a track-based spatial representation for CNV pathogenicity prediction, which represents each variant as a base-pair-resolution multi-track image and models spatial genomic patterns across annotation tracks while preserving positional structure and cross-track dependencies. Trained on a chromosome-level hold-out split of the ClinVar dataset, TO_SCPLOWESSERACTC_SCPLOW outperforms prior methods on held-out and curated noncoding benchmarks, improving AUROC by up to 0.10 over the state-of-the-art baseline. On the independent DECIPHER cohort, the model demonstrates generalizability by maintaining the highest AUROC and the highest F1 score across baselines. Furthermore, our model localizes pathogenic signals to clinically meaningful genomic subregions, providing track-annotated evidence that supports practical clinical interpretation.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.2%
18.0%
2
Bioinformatics
1061 papers in training set
Top 3%
9.8%
3
The American Journal of Human Genetics
206 papers in training set
Top 0.8%
6.2%
4
Nature Communications
4913 papers in training set
Top 31%
6.2%
5
Briefings in Bioinformatics
326 papers in training set
Top 1.0%
6.1%
6
Nucleic Acids Research
1128 papers in training set
Top 4%
6.1%
50% of probability mass above
7
Genome Biology
555 papers in training set
Top 2%
4.7%
8
Advanced Science
249 papers in training set
Top 7%
3.0%
9
Nature Machine Intelligence
61 papers in training set
Top 1%
3.0%
10
Cell Systems
167 papers in training set
Top 5%
2.5%
11
Nature Biotechnology
147 papers in training set
Top 3%
2.5%
12
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.0%
13
Genome Research
409 papers in training set
Top 2%
2.0%
14
Cell Genomics
162 papers in training set
Top 3%
1.8%
15
PLOS Computational Biology
1633 papers in training set
Top 18%
1.4%
16
PLOS Genetics
756 papers in training set
Top 11%
1.3%
17
Nature
575 papers in training set
Top 13%
1.3%
18
Nature Methods
336 papers in training set
Top 5%
1.3%
19
Frontiers in Genetics
197 papers in training set
Top 7%
1.1%
20
Communications Biology
886 papers in training set
Top 16%
1.1%
21
Science
429 papers in training set
Top 18%
0.9%
22
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.5%
0.9%
23
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
24
PLOS ONE
4510 papers in training set
Top 66%
0.8%
25
Nature Genetics
240 papers in training set
Top 7%
0.8%
26
Scientific Reports
3102 papers in training set
Top 76%
0.7%
27
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
28
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
29
iScience
1063 papers in training set
Top 35%
0.7%
30
Cell Reports Medicine
140 papers in training set
Top 10%
0.6%