Back

Locat: Joint enrichment and depletion testing identifies localized marker genes in single-cell transcriptomics

Lewis, W. R.; Aizenbud, Y.; Strino, F.; Kluger, Y.; Parisi, F.

2026-04-07 bioinformatics
10.64898/2026.04.03.716370 bioRxiv
Show abstract

Several methods have been developed to identify marker genes that delineate cell populations in single-cell transcriptomic data, yet most emphasize enrichment within candidate populations without testing whether expression is significantly reduced outside those populations. We present Locat, a framework for identifying highly specific localized genes by testing whether expression is concentrated within compact regions of the cellular embedding and depleted elsewhere. For each gene, Locat fits weighted Gaussian mixture models to gene-specific and background densities, computes test statistics for concentration within compact regions and depletion outside those regions, and integrates the results into a unified localization score. Across synthetic benchmarks with controlled ground truth, Locat detects localized genes spanning uni-modal, multi-modal, and sparse expression patterns, and appropriately loses significance when simulated expression becomes indistinguishable from background structure. In biological datasets spanning developmental, perturbation, and differentiation contexts, Locat identifies compact marker sets that capture lineage organization, condition-specific programs, and temporal regulatory dynamics. Localized gene sets are often smaller than conventional feature selections such as highly variable genes, and embeddings constructed from localized gene sets tend to preserve separation of major cell populations and developmental programs. In murine dermis, embeddings computed using localized genes preserve differentiation and cell-cycle trajectories observed in the full dataset. In interferon-{beta}-treated PBMCs, independent localization analysis of control and stimulated samples reveals stimulus-responsive programs and markers of shared immune populations without requiring batch correction or data integration. In retinoic acid-induced embryonic stem cell differentiation, localized genes exhibit reproducible stage-specific patterns across time points. Together, these results demonstrate that jointly assessing concentration and depletion yields specific, interpretable marker genes that enable direct cross-condition and multi-sample comparisons of marker genes across diverse biological settings.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.4%
18.4%
2
Bioinformatics
1061 papers in training set
Top 3%
10.3%
3
Nature Methods
336 papers in training set
Top 1.0%
10.0%
4
Nature Biotechnology
147 papers in training set
Top 1%
6.3%
5
Genome Biology
555 papers in training set
Top 1%
6.3%
50% of probability mass above
6
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
7
Cell Reports Methods
141 papers in training set
Top 0.6%
4.3%
8
Nucleic Acids Research
1128 papers in training set
Top 5%
4.3%
9
Nature Communications
4913 papers in training set
Top 40%
3.6%
10
Genome Research
409 papers in training set
Top 2%
1.9%
11
Briefings in Bioinformatics
326 papers in training set
Top 3%
1.9%
12
Genome Medicine
154 papers in training set
Top 4%
1.9%
13
PLOS ONE
4510 papers in training set
Top 52%
1.8%
14
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.5%
17
Scientific Reports
3102 papers in training set
Top 64%
1.3%
18
The American Journal of Human Genetics
206 papers in training set
Top 3%
0.9%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
20
Cell Reports
1338 papers in training set
Top 32%
0.8%
21
Development
440 papers in training set
Top 3%
0.8%
22
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
23
Nature Genetics
240 papers in training set
Top 8%
0.6%