Back
GenVarLoader: An accelerated dataloader for applying deep learning to personalized genomics
Laub, D.; Ho, A.; Jaureguy, J.; Klie, A.; Salem, R. M.; McVicker, G.; Carter, H.
2025-01-17
genomics
10.1101/2025.01.15.633240
bioRxiv
Show abstract
Deep learning sequence models trained on personalized genomics can improve variant effect prediction, however, applications of these models are limited by computational requirements for storing and reading large datasets. We address this with GenVarLoader, which stores personalized genomic data in new memory-mapped formats with optimal data locality to achieve [~]1,000x faster throughput and [~]2,000x better compression compared to existing alternatives.
Matching journals
●Non-profit
◐University press
○Commercial
The top 5 journals account for 50% of the predicted probability mass.
1
Genome Research
●
409 papers in training set
Top 0.1%
22.8%
2
Bioinformatics
◐
1061 papers in training set
Top 3%
9.2%
3
Nature Methods
○
336 papers in training set
Top 1%
7.3%
4
Nature Biotechnology
○
147 papers in training set
Top 1%
6.9%
5
Genome Biology
○
555 papers in training set
Top 0.8%
6.9%
50% of probability mass above
6
Nature Communications
○
4913 papers in training set
Top 35%
4.4%
7
Nucleic Acids Research
◐
1128 papers in training set
Top 5%
3.6%
8
Nature Genetics
○
240 papers in training set
Top 2%
3.6%
9
Genome Medicine
○
154 papers in training set
Top 2%
3.6%
10
Cell Genomics
○
162 papers in training set
Top 2%
3.1%
11
Cell Systems
○
167 papers in training set
Top 5%
2.1%
12
GigaScience
◐
172 papers in training set
Top 1%
1.8%
13
Nature Computational Science
○
50 papers in training set
Top 0.5%
1.7%
14
The American Journal of Human Genetics
○
206 papers in training set
Top 2%
1.7%
15
Bioinformatics Advances
◐
184 papers in training set
Top 3%
1.7%
16
Science
●
429 papers in training set
Top 15%
1.5%
17
BMC Genomics
○
328 papers in training set
Top 3%
1.3%
18
Briefings in Bioinformatics
◐
326 papers in training set
Top 5%
1.2%
19
Nature
○
575 papers in training set
Top 13%
1.0%
20
NAR Genomics and Bioinformatics
◐
214 papers in training set
Top 3%
0.9%
21
Nature Machine Intelligence
○
61 papers in training set
Top 3%
0.9%
22
Cell
○
370 papers in training set
Top 15%
0.9%
23
Frontiers in Genetics
○
197 papers in training set
Top 8%
0.9%
24
BMC Bioinformatics
○
383 papers in training set
Top 7%
0.8%
25
IEEE Transactions on Computational Biology and Bioinformatics
●
17 papers in training set
Top 0.6%
0.8%
26
Genomics, Proteomics & Bioinformatics
◐
171 papers in training set
Top 6%
0.7%
27
Scientific Reports
○
3102 papers in training set
Top 80%
0.5%
28
JCO Clinical Cancer Informatics
●
18 papers in training set
Top 1%
0.5%