Vision-Based Genomic Model for Copy Number Variant Pathogenicity Prediction
Buralkin, I.; Botas, J.; Chang, K.-L.; Deng, Y.; Papastathopoulos-Katsaros, A.; Liu, Z.; Park, J.
Show abstract
Copy number variants (CNVs) are a major class of structural genomic alterations underlying rare disease, including neurodevelopmental delay and intellectual disability, yet predicting their pathogenicity remains challenging. Existing methods reduce CNVs to region-level numerical features, discarding the positional structure and cross-track patterns that expert clinical reviewers use to interpret genomic evidence. To address this, we introduce TO_SCPLOWESSERACTC_SCPLOW for CNV, a track-based spatial representation for CNV pathogenicity prediction, which represents each variant as a base-pair-resolution multi-track image and models spatial genomic patterns across annotation tracks while preserving positional structure and cross-track dependencies. Trained on a chromosome-level hold-out split of the ClinVar dataset, TO_SCPLOWESSERACTC_SCPLOW outperforms prior methods on held-out and curated noncoding benchmarks, improving AUROC by up to 0.10 over the state-of-the-art baseline. On the independent DECIPHER cohort, the model demonstrates generalizability by maintaining the highest AUROC and the highest F1 score across baselines. Furthermore, our model localizes pathogenic signals to clinically meaningful genomic subregions, providing track-annotated evidence that supports practical clinical interpretation.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.