Back

GenVarLoader: An accelerated dataloader for applying deep learning to personalized genomics

Laub, D.; Ho, A.; Jaureguy, J.; Klie, A.; Salem, R. M.; McVicker, G.; Carter, H.

2025-01-17 genomics
10.1101/2025.01.15.633240 bioRxiv
Show abstract

Deep learning sequence models trained on personalized genomics can improve variant effect prediction, however, applications of these models are limited by computational requirements for storing and reading large datasets. We address this with GenVarLoader, which stores personalized genomic data in new memory-mapped formats with optimal data locality to achieve [~]1,000x faster throughput and [~]2,000x better compression compared to existing alternatives.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Genome Research
409 papers in training set
Top 0.1%
22.8%
2
Bioinformatics
1061 papers in training set
Top 3%
9.2%
3
Nature Methods
336 papers in training set
Top 1%
7.3%
4
Nature Biotechnology
147 papers in training set
Top 1%
6.9%
5
Genome Biology
555 papers in training set
Top 0.8%
6.9%
50% of probability mass above
6
Nature Communications
4913 papers in training set
Top 35%
4.4%
7
Nucleic Acids Research
1128 papers in training set
Top 5%
3.6%
8
Nature Genetics
240 papers in training set
Top 2%
3.6%
9
Genome Medicine
154 papers in training set
Top 2%
3.6%
10
Cell Genomics
162 papers in training set
Top 2%
3.1%
11
Cell Systems
167 papers in training set
Top 5%
2.1%
12
GigaScience
172 papers in training set
Top 1%
1.8%
13
Nature Computational Science
50 papers in training set
Top 0.5%
1.7%
14
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
16
Science
429 papers in training set
Top 15%
1.5%
17
BMC Genomics
328 papers in training set
Top 3%
1.3%
18
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
19
Nature
575 papers in training set
Top 13%
1.0%
20
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
21
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
22
Cell
370 papers in training set
Top 15%
0.9%
23
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
25
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.6%
0.8%
26
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
27
Scientific Reports
3102 papers in training set
Top 80%
0.5%
28
JCO Clinical Cancer Informatics
18 papers in training set
Top 1%
0.5%