Back

MiCBuS: Marker Gene Mining for Unknown Cell Types Using Bulk and Single Cell RNA-Seq Data

Zhang, S.; Lu, Y.; Luo, Q.; An, L.

2026-03-24 bioinformatics
10.64898/2026.03.20.711946 bioRxiv
Show abstract

Identifying cell type-specific expressed genes (marker genes) is essential for understanding the roles and interactions of cell populations within tissues. To achieve this, the traditional differential analysis approaches are often applied to individual cell-type bulk RNA-seq and single-cell RNA-seq data. However, real-world datasets often pose challenges, such as heterogeneous bulk RNA-seq and incomplete scRNA-seq. Heterogeneous bulk RNA-seq amalgamates gene expression profiles from multiple cell types and results in low resolution, while incomplete scRNA-seq does not capture some cell types from the tissue, leading to unknown cell types. Traditional methods fail to identify marker genes for such unknown cell types. MiCBuS addresses this limitation by generating Dirichlet-pseudo-bulk RNA-seq based on bulk and incomplete single-cell RNA-seq data. By performing differential analysis of gene expressions on bulk and Dirichlet-pseudo-bulk RNA-seq samples, MiCBuS can identify the marker genes of unknown cell types, enabling the identification and characterization of these elusive cellular components. Simulation studies and real data analyses demonstrate that MiCBuS reliably and robustly identifies marker genes specific to unknown cell types, a capability that traditional differential analysis methods cannot achieve. Availability and implementationMiCBuS is implemented in the R language and freely available at https://github.com/Shanshan-Zhang/MiCBuS.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1.0%
23.3%
2
Genome Biology
555 papers in training set
Top 0.1%
19.3%
3
Genome Research
409 papers in training set
Top 0.3%
7.4%
50% of probability mass above
4
BMC Bioinformatics
383 papers in training set
Top 1%
7.1%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.5%
6
Cell Systems
167 papers in training set
Top 3%
3.7%
7
Nature Biotechnology
147 papers in training set
Top 3%
2.7%
8
Nature Methods
336 papers in training set
Top 3%
2.4%
9
PLOS Computational Biology
1633 papers in training set
Top 13%
2.4%
10
iScience
1063 papers in training set
Top 11%
2.0%
11
PLOS ONE
4510 papers in training set
Top 52%
1.8%
12
Nature Communications
4913 papers in training set
Top 50%
1.8%
13
Bioinformatics Advances
184 papers in training set
Top 3%
1.4%
14
PLOS Genetics
756 papers in training set
Top 11%
1.3%
15
Cell Reports Methods
141 papers in training set
Top 4%
1.1%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
1.0%
17
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
18
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
19
Genome Medicine
154 papers in training set
Top 7%
0.8%
20
Scientific Reports
3102 papers in training set
Top 72%
0.8%
21
BMC Genomics
328 papers in training set
Top 5%
0.8%
22
Communications Biology
886 papers in training set
Top 22%
0.8%
23
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.5%
24
Life Science Alliance
263 papers in training set
Top 3%
0.5%
25
Nature Computational Science
50 papers in training set
Top 2%
0.5%
26
Development
440 papers in training set
Top 4%
0.5%