Back

Linking Genetic Risk to Disease-Relevant Cellular States via Metacell-Informed Modeling with ICePop

Yuan, H.; Mandava, A.; Sarmart, K.; Ganz, J.; Krishnan, A.

2026-04-03 genomics
10.64898/2026.04.01.715877 bioRxiv
Show abstract

Genome-wide association studies (GWAS) have implicated thousands of loci in complex diseases, but translating these population-level signals into specific cellular contexts remains a central challenge. Integrating GWAS with single-cell transcriptomics data has enabled systematic identification of disease-relevant cell types, yet existing methods face a fundamental tradeoff: approaches like seismic that optimized for statistical power operate at the annotated cell-type level and miss heterogeneous disease signals concentrated in specific cellular states, while single-cell-resolution approaches like scDRS that capture such heterogeneity often lack sufficient power to detect subtle associations. Here we present ICePop (Informative Cell Populations), a framework that resolves this tradeoff by performing disease-cell type association at metacell resolution, thus achieving statistical power comparable to cell-type-level methods while detecting heterogeneous disease signals within cell types. In simulations against seismic and scDRS, ICePop maintains appropriate false positive rates and demonstrates superior power when disease effects are concentrated in cellular subpopulations. Applied to Tabula Muris across 81 traits and 120 cell types, ICePop identifies 2,178 disease-cell type associations, including the preferential vulnerability of differentiated gut epithelial cells in ulcerative colitis and loss of cell identity in immune-stressed lung capillary endothelial cells underlying their association with lung function. Clustering diseases by metacell association profiles reveals groupings that diverge from genetic risk-based clustering, including separation of blood cell count traits from immune diseases despite shared genetic architecture, reflecting differences in cellular rather than genetic etiology. In autism spectrum disorder, ICePop identifies preferential enrichment of genetic risk in specific enteric neuron subtypes, implicating dysfunction of the enteric nervous system in gastrointestinal comorbidities. ICePops resolution of disease-relevant cell states within annotated cell types enables generation of testable, cell-state-specific hypotheses about disease mechanisms and therapeutic targets.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.3%
18.0%
2
Cell Genomics
162 papers in training set
Top 0.4%
7.0%
3
Science
429 papers in training set
Top 5%
6.6%
4
Nature Biotechnology
147 papers in training set
Top 1%
6.6%
5
Nature
575 papers in training set
Top 5%
6.1%
6
Nature Communications
4913 papers in training set
Top 31%
6.1%
50% of probability mass above
7
Nature Neuroscience
216 papers in training set
Top 2%
4.7%
8
Cell
370 papers in training set
Top 7%
3.5%
9
The American Journal of Human Genetics
206 papers in training set
Top 1%
3.5%
10
Genome Biology
555 papers in training set
Top 3%
3.5%
11
Cell Systems
167 papers in training set
Top 5%
3.0%
12
Science Translational Medicine
111 papers in training set
Top 1%
3.0%
13
Nature Medicine
117 papers in training set
Top 1%
2.6%
14
Nature Methods
336 papers in training set
Top 3%
2.5%
15
Genome Medicine
154 papers in training set
Top 4%
1.8%
16
Cell Reports
1338 papers in training set
Top 26%
1.4%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.4%
18
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
19
Nature Cell Biology
99 papers in training set
Top 4%
1.2%
20
Nucleic Acids Research
1128 papers in training set
Top 16%
0.9%
21
Nature Computational Science
50 papers in training set
Top 1%
0.9%
22
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
23
eLife
5422 papers in training set
Top 57%
0.8%
24
Science Advances
1098 papers in training set
Top 31%
0.7%
25
PLOS Computational Biology
1633 papers in training set
Top 26%
0.7%
26
Nature Ecology & Evolution
113 papers in training set
Top 5%
0.7%
27
Genome Research
409 papers in training set
Top 5%
0.7%