Back

Generative machine learning unlocks the first proteome-wide image of human cells

Sun, H.; Kahnert, K.; Hansen, J. N.; Leineweber, W. D.; Li, M.; Feng, W.; Ballllosera Navarro, F.; Axelsson, U.; Ouyang, W.; Lundberg, E.

2026-04-02 cell biology
10.64898/2026.03.31.715748 bioRxiv
Show abstract

The spatial organization of proteins within cells governs virtually all cellular functions. Yet, current imaging technologies can simultaneously visualize only tens of proteins, orders of magnitude below the thousands that populate a single human cell. Here, we present ProtiCelli, a deep generative model that simulates microscopy images for 12,800 human proteins from just three cellular landmark stains. Trained on 1.23 million images from the Human Protein Atlas, ProtiCelli outperforms existing methods in reconstruction accuracy and textural fidelity, and generalizes to unseen cell types and drug perturbations absent from training. We demonstrate that ProtiCelli-generated images preserve hierarchical subcellular organization, recapitulate known protein-protein interaction landscapes, and resolve compartment-specific functions of moonlighting proteins at the single-cell level. Remarkably, the model infers drug-induced changes in protein expression and localization from cell morphology alone, predicts cell cycle stage without dedicated cell cycle markers, and enables unsupervised segmentation of subcellular compartments as well as spatial decomposition of gene sets into functional regions. Ultimately, we leverage ProtiCelli to generate Proteome2Cell, an unprecedented dataset of 30.7 million simulated images creating 2,400 "virtual cells" across 12 human cell lines. These proteome-scale images enable the construction of hierarchical single-cell models that distinguish conserved from dynamic protein architectures. Integration of Proteome2Cell into the Human Protein Atlas democratizes the exploration of these "virtual cells". By computationally bridging the experimental scalability gap, ProtiCelli establishes a foundation for spatial virtual cell modeling and paves an avenue for transforming spatial proteomics from cataloging proteins to simulating complete cellular systems.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.2%
23.3%
2
Nature
575 papers in training set
Top 2%
12.7%
3
Nature Communications
4913 papers in training set
Top 24%
7.4%
4
Nature Machine Intelligence
61 papers in training set
Top 0.3%
7.0%
50% of probability mass above
5
Cell Systems
167 papers in training set
Top 2%
5.0%
6
Cell
370 papers in training set
Top 6%
3.7%
7
Nature Cell Biology
99 papers in training set
Top 1%
3.7%
8
Science
429 papers in training set
Top 8%
3.7%
9
Nature Medicine
117 papers in training set
Top 1.0%
3.2%
10
Nature Biotechnology
147 papers in training set
Top 3%
3.2%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 25%
2.5%
12
Nature Neuroscience
216 papers in training set
Top 4%
1.9%
13
Journal of Cell Biology
333 papers in training set
Top 2%
1.5%
14
Science Advances
1098 papers in training set
Top 21%
1.4%
15
Advanced Science
249 papers in training set
Top 15%
1.0%
16
Nature Structural & Molecular Biology
218 papers in training set
Top 4%
1.0%
17
Molecular Cell
308 papers in training set
Top 9%
1.0%
18
Genome Biology
555 papers in training set
Top 6%
1.0%
19
Nature Immunology
71 papers in training set
Top 2%
0.9%
20
Nature Chemical Biology
104 papers in training set
Top 3%
0.9%
21
Cell Reports
1338 papers in training set
Top 31%
0.8%
22
Nature Aging
51 papers in training set
Top 2%
0.8%
23
eLife
5422 papers in training set
Top 56%
0.8%
24
Developmental Cell
168 papers in training set
Top 11%
0.8%
25
Nature Genetics
240 papers in training set
Top 7%
0.7%
26
Cell Metabolism
49 papers in training set
Top 3%
0.5%