Back

Recovering biological structure in sparse single-cell proteomics with GIRAFI

Zhong, H.; Chi, S.; Wong, R.; Rogalski, J.; Wang, Z.; Chan, S.; Bailey, M. L.; Ebrahimi, A.; Jayme, G.; Yin, J.; Gong, A.; Snutch, T. P.; Maier, C. S.; Marra, M. A.; Foster, L. J.; Tang, X.

2026-05-21 bioinformatics
10.64898/2026.05.18.726081 bioRxiv
Show abstract

Single-cell proteomics (SCP) based on liquid-chromatography mass-spectrometry resolves protein-level cellular heterogeneity, but interpretation remains limited by detection-linked sparsity. SCP profiles continuous, peptide-derived intensities and has lower throughput than single-cell RNA sequencing, making denoising methods for large-scale, count-based transcriptomics difficult to apply. Here we present GIRAFI, a graph-informed statistical learning framework that imputes missing values and reveals reproducible cell states by constraining inference to dataset-aware, prior-knowledge-informed protein neighborhoods. We evaluated GIRAFI across SCP datasets spanning diverse biological/technical contexts. In masking-based recovery experiments and cell-type-specific protein-protein interaction inference, GIRAFI outperformed existing methods, and matched bulk proteomics comparisons corroborated recovery accuracy and ablations supported the graph-informed design. Beyond reduced replicate- and source-associated technical structure, GIRAFI recovered ground-truth cell-type annotations, improved cell state-resolved pathway analysis, and enabled trajectory inference consistent with known time courses. These results establish graph-constrained imputation as an effective strategy for improving SCP robustness, biological structure, interpretation, and cross-dataset comparability.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.1%
28.1%
2
Nature Communications
4913 papers in training set
Top 5%
18.9%
3
Nature Biotechnology
147 papers in training set
Top 0.8%
9.3%
50% of probability mass above
4
Cell Systems
167 papers in training set
Top 2%
6.4%
5
Nature Chemical Biology
104 papers in training set
Top 0.6%
4.0%
6
Nature Machine Intelligence
61 papers in training set
Top 0.9%
3.6%
7
Genome Biology
555 papers in training set
Top 2%
3.6%
8
Molecular & Cellular Proteomics
158 papers in training set
Top 0.9%
2.1%
9
Advanced Science
249 papers in training set
Top 10%
1.8%
10
Bioinformatics
1061 papers in training set
Top 7%
1.7%
11
Nature
575 papers in training set
Top 12%
1.5%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
1.0%
13
Science
429 papers in training set
Top 18%
0.9%
14
Nature Microbiology
133 papers in training set
Top 4%
0.9%
15
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.8%
17
Nature Metabolism
56 papers in training set
Top 3%
0.8%
18
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
19
Cell Genomics
162 papers in training set
Top 7%
0.7%
20
Cell
370 papers in training set
Top 19%
0.7%
21
Molecular Cell
308 papers in training set
Top 11%
0.7%
22
Genome Medicine
154 papers in training set
Top 10%
0.5%
23
Genome Research
409 papers in training set
Top 5%
0.5%
24
Science Advances
1098 papers in training set
Top 35%
0.5%
25
Cell Reports Methods
141 papers in training set
Top 7%
0.5%
26
PLOS ONE
4510 papers in training set
Top 72%
0.5%