scProfiterole: Clustering of Single-Cell Proteomic DataUsing Graph Contrastive Learning via Spectral Filters
Coskun, M.; Lopes, F. B.; Kubilay Tolunay, P.; Chance, M. R.; Koyuturk, M.
Show abstract
Novel technologies for the acquisition of protein expression data at the single cell level are emerging rapidly. Although there exists a substantial body of computational algorithms and tools for the analysis of single cell gene expression (scRNAseq) data, tools for even basic tasks such as clustering or cell type identification for single cell proteomic (scProteomics) data are relatively scarce. Adoption of algorithms that have been developed for scRNAseq into scProteomics is challenged by the larger number of drop-outs, missing data, and noise in single cell proteomic data. Graph contrastive learning (GCL) on cell-to-cell similarity graphs derived from single cell protein expression profiles show promise in cell type identification. However, missing edges and noise in the cell-to-cell similarity graph requires careful design of convolution matrices to overcome the imperfections in these graphs. Here, we introduce scPO_SCPLOWROFITEROLEC_SCPLOW (Single Cell Proteomics Clustering via Spectral Filters), a computational framework to facilitate effective use of spectral graph filters in GCL-based clustering of single cell proteomic data. Since clustering assumes a homophilic network topology, we consider three types of homophilic filters: (i) random walks, (ii) heat kernels, (iii) beta kernels. Direct implementation of these filters is computationally prohibitive, thus the filters are either truncated or approximated in practice. To overcome this limitation, scPO_SCPLOWROFITEROLEC_SCPLOW uses Arnoldi orthonormalization to implement polynomial interpolations of any given spectral graph filter. Our results on comprehensive single cell proteomic data show that (i) graph contrastive learning with learnable polynomial coefficients that are carefully initialized improves the effectiveness and robustness of cell type identification, (ii) heat kernels and beta kernels improve clustering performance over adjacency matrices or random walks, and (iii) polynomial interpolation of spectral filters outperforms approximation or truncation. The source code for scPO_SCPLOWROFITEROLEC_SCPLOW and Supplementary Materials are available at https://github.com/mustafaCoskunAgu/scProfiterole.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.