Back

CN-RNN: a Deep Learning Framework for Copy Number Variation Detection with Exome Sequencing Data

Wang, D.; Qin, F.; Bao, W.; Bacher, R.; Chung, D.; Lu, Q.; Efron, P. A.; Cai, G.; Xiao, F.

2026-05-15 genetics
10.64898/2026.05.13.724920 bioRxiv
Show abstract

Copy number variations (CNVs) are major structural genomic variants that contribute to a wide range of human diseases. Accurate detection of CNVs from whole-exome sequencing (WES) data has been a long-sought goal for clinical and population genetic studies. Despite recent progress, existing WES-based CNV callers still suffer from high false-positive rates and reduced recall for short-length variants, and current deep learning methods have not fully used complementary information in region-level genomic features. Here we present CN-RNN, a deep learning-based CNV caller for WES data. The model combines a bidirectional long short-term memory (BiLSTM) branch that captures local depth changes and contextual dependencies across neighboring exons with a parallel multi-layer perceptron (MLP) branch that encodes region-level metadata such as GC content, mappability, and exon length. CN-RNN was trained on the Autism Sequencing Consortium (ASC) parent-child trio cohort using the Mendelian rule of inheritance to ensure high-quality training sets. It was evaluated across three independent datasets, in which we showed that CN-RNN outperformed existing WES-based CNV callers and deep learning methods. CN-RNN offers a scalable, accurate tool for CNV profiling in WES-based studies and supports broader application of CNV analysis in population and clinical research. CN-RNN is available at https://github.com/FeifeiXiao-lab/CN-RNN.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
17.0%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.3%
14.0%
3
Bioinformatics Advances
184 papers in training set
Top 0.3%
7.0%
4
Genome Medicine
154 papers in training set
Top 1%
6.2%
5
Genome Research
409 papers in training set
Top 0.5%
6.1%
50% of probability mass above
6
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.7%
7
Nature Communications
4913 papers in training set
Top 38%
3.9%
8
Frontiers in Genetics
197 papers in training set
Top 2%
3.9%
9
BMC Bioinformatics
383 papers in training set
Top 3%
3.5%
10
Genetic Epidemiology
46 papers in training set
Top 0.3%
2.5%
11
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
12
Cell Genomics
162 papers in training set
Top 4%
1.6%
13
Nature Computational Science
50 papers in training set
Top 0.8%
1.4%
14
Scientific Reports
3102 papers in training set
Top 62%
1.4%
15
Journal of Genetics and Genomics
36 papers in training set
Top 1%
1.4%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.2%
17
npj Genomic Medicine
33 papers in training set
Top 0.6%
1.2%
18
PLOS Computational Biology
1633 papers in training set
Top 21%
1.1%
19
European Journal of Human Genetics
49 papers in training set
Top 1%
0.9%
20
Nature Biotechnology
147 papers in training set
Top 7%
0.9%
21
PLOS Genetics
756 papers in training set
Top 13%
0.9%
22
Nature Genetics
240 papers in training set
Top 7%
0.8%
23
PLOS ONE
4510 papers in training set
Top 66%
0.8%
24
BMC Genomics
328 papers in training set
Top 6%
0.7%
25
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
26
Genetics in Medicine
69 papers in training set
Top 1%
0.7%
27
Communications Biology
886 papers in training set
Top 26%
0.7%
28
Nature
575 papers in training set
Top 17%
0.7%
29
Nature Methods
336 papers in training set
Top 7%
0.6%
30
Genetics
225 papers in training set
Top 5%
0.6%