Locat: Joint enrichment and depletion testing identifies localized marker genes in single-cell transcriptomics
Lewis, W. R.; Aizenbud, Y.; Strino, F.; Kluger, Y.; Parisi, F.
Show abstract
Several methods have been developed to identify marker genes that delineate cell populations in single-cell transcriptomic data, yet most emphasize enrichment within candidate populations without testing whether expression is significantly reduced outside those populations. We present Locat, a framework for identifying highly specific localized genes by testing whether expression is concentrated within compact regions of the cellular embedding and depleted elsewhere. For each gene, Locat fits weighted Gaussian mixture models to gene-specific and background densities, computes test statistics for concentration within compact regions and depletion outside those regions, and integrates the results into a unified localization score. Across synthetic benchmarks with controlled ground truth, Locat detects localized genes spanning uni-modal, multi-modal, and sparse expression patterns, and appropriately loses significance when simulated expression becomes indistinguishable from background structure. In biological datasets spanning developmental, perturbation, and differentiation contexts, Locat identifies compact marker sets that capture lineage organization, condition-specific programs, and temporal regulatory dynamics. Localized gene sets are often smaller than conventional feature selections such as highly variable genes, and embeddings constructed from localized gene sets tend to preserve separation of major cell populations and developmental programs. In murine dermis, embeddings computed using localized genes preserve differentiation and cell-cycle trajectories observed in the full dataset. In interferon-{beta}-treated PBMCs, independent localization analysis of control and stimulated samples reveals stimulus-responsive programs and markers of shared immune populations without requiring batch correction or data integration. In retinoic acid-induced embryonic stem cell differentiation, localized genes exhibit reproducible stage-specific patterns across time points. Together, these results demonstrate that jointly assessing concentration and depletion yields specific, interpretable marker genes that enable direct cross-condition and multi-sample comparisons of marker genes across diverse biological settings.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.