Generative machine learning unlocks the first proteome-wide image of human cells
Sun, H.; Kahnert, K.; Hansen, J. N.; Leineweber, W. D.; Li, M.; Feng, W.; Ballllosera Navarro, F.; Axelsson, U.; Ouyang, W.; Lundberg, E.
Show abstract
The spatial organization of proteins within cells governs virtually all cellular functions. Yet, current imaging technologies can simultaneously visualize only tens of proteins, orders of magnitude below the thousands that populate a single human cell. Here, we present ProtiCelli, a deep generative model that simulates microscopy images for 12,800 human proteins from just three cellular landmark stains. Trained on 1.23 million images from the Human Protein Atlas, ProtiCelli outperforms existing methods in reconstruction accuracy and textural fidelity, and generalizes to unseen cell types and drug perturbations absent from training. We demonstrate that ProtiCelli-generated images preserve hierarchical subcellular organization, recapitulate known protein-protein interaction landscapes, and resolve compartment-specific functions of moonlighting proteins at the single-cell level. Remarkably, the model infers drug-induced changes in protein expression and localization from cell morphology alone, predicts cell cycle stage without dedicated cell cycle markers, and enables unsupervised segmentation of subcellular compartments as well as spatial decomposition of gene sets into functional regions. Ultimately, we leverage ProtiCelli to generate Proteome2Cell, an unprecedented dataset of 30.7 million simulated images creating 2,400 "virtual cells" across 12 human cell lines. These proteome-scale images enable the construction of hierarchical single-cell models that distinguish conserved from dynamic protein architectures. Integration of Proteome2Cell into the Human Protein Atlas democratizes the exploration of these "virtual cells". By computationally bridging the experimental scalability gap, ProtiCelli establishes a foundation for spatial virtual cell modeling and paves an avenue for transforming spatial proteomics from cataloging proteins to simulating complete cellular systems.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.