Accelerating Identification of Chromatin Accessibility from noisy ATAC-seq Data using Modern CPUs

Chaudhary, N.; Misra, S.; Kalamkar, D.; Heinecke, A.; Georganas, E.; Ziv, B.; Adelman, M.; Kaul, B.

2021-09-30 genomics

10.1101/2021.09.28.462099 bioRxiv

Show abstract

Identifying accessible chromatin regions is a fundamental problem in epigenomics with ATAC-seq being a commonly used assay. Exponential rise in single cell ATAC-seq experiments has made it critical to accelerate processing of ATAC-seq data. ATAC-seq data can have a low signal-to-noise ratio for various reasons including low coverage or low cell count. To denoise and identify accessible chromatin regions from noisy ATAC-seq data, use of deep learning on 1D data - using large filter sizes, long tensor widths, and/or dilation - has recently been proposed. Here, we present ways to accelerate the end-to-end training performance of these deep learning based methods using CPUs. We evaluate our approach on the recently released AtacWorks toolkit. Compared to an Nvidia DGX-1 box with 8 V100 GPUs, we get up to 2.27x speedup using just 16 CPU sockets. To achieve this, we build an efficient 1D dilated convolution layer and demonstrate reduced precision (BFloat16) training.

Accelerating Identification of Chromatin Accessibility from noisy ATAC-seq Data using Modern CPUs

Matching journals