Back

scProfiterole: Clustering of Single-Cell Proteomic DataUsing Graph Contrastive Learning via Spectral Filters

Coskun, M.; Lopes, F. B.; Kubilay Tolunay, P.; Chance, M. R.; Koyuturk, M.

2026-02-28 bioinformatics
10.64898/2026.02.26.708196 bioRxiv
Show abstract

Novel technologies for the acquisition of protein expression data at the single cell level are emerging rapidly. Although there exists a substantial body of computational algorithms and tools for the analysis of single cell gene expression (scRNAseq) data, tools for even basic tasks such as clustering or cell type identification for single cell proteomic (scProteomics) data are relatively scarce. Adoption of algorithms that have been developed for scRNAseq into scProteomics is challenged by the larger number of drop-outs, missing data, and noise in single cell proteomic data. Graph contrastive learning (GCL) on cell-to-cell similarity graphs derived from single cell protein expression profiles show promise in cell type identification. However, missing edges and noise in the cell-to-cell similarity graph requires careful design of convolution matrices to overcome the imperfections in these graphs. Here, we introduce scPO_SCPLOWROFITEROLEC_SCPLOW (Single Cell Proteomics Clustering via Spectral Filters), a computational framework to facilitate effective use of spectral graph filters in GCL-based clustering of single cell proteomic data. Since clustering assumes a homophilic network topology, we consider three types of homophilic filters: (i) random walks, (ii) heat kernels, (iii) beta kernels. Direct implementation of these filters is computationally prohibitive, thus the filters are either truncated or approximated in practice. To overcome this limitation, scPO_SCPLOWROFITEROLEC_SCPLOW uses Arnoldi orthonormalization to implement polynomial interpolations of any given spectral graph filter. Our results on comprehensive single cell proteomic data show that (i) graph contrastive learning with learnable polynomial coefficients that are carefully initialized improves the effectiveness and robustness of cell type identification, (ii) heat kernels and beta kernels improve clustering performance over adjacency matrices or random walks, and (iii) polynomial interpolation of spectral filters outperforms approximation or truncation. The source code for scPO_SCPLOWROFITEROLEC_SCPLOW and Supplementary Materials are available at https://github.com/mustafaCoskunAgu/scProfiterole.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.9%
25.6%
2
Nature Methods
336 papers in training set
Top 0.4%
17.3%
3
Molecular & Cellular Proteomics
158 papers in training set
Top 0.4%
6.7%
4
Nature Communications
4913 papers in training set
Top 29%
6.3%
50% of probability mass above
5
Journal of Proteome Research
215 papers in training set
Top 0.6%
4.8%
6
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.2%
7
Cell Reports Methods
141 papers in training set
Top 1%
3.0%
8
Nature Biotechnology
147 papers in training set
Top 3%
2.7%
9
Cell Systems
167 papers in training set
Top 5%
2.7%
10
Genome Research
409 papers in training set
Top 1%
2.6%
11
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
12
Genome Biology
555 papers in training set
Top 4%
1.7%
13
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
14
Patterns
70 papers in training set
Top 1.0%
1.6%
15
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
1.2%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
17
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
18
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
19
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
20
PLOS ONE
4510 papers in training set
Top 70%
0.7%
21
iScience
1063 papers in training set
Top 38%
0.6%
22
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.6%