Back

Single-cell hit calling in high-content imaging screens with Buscar

Serrano, E.; Li, W.-s.; Way, G. P.

2026-04-19 bioinformatics
10.64898/2026.04.15.718737 bioRxiv
Show abstract

High-content screening (HCS) enables the systematic quantification of single-cell morphology features across thousands of perturbations, capturing rich phenotypic heterogeneity. Image-based profiling is a critical bioinformatics processing step in this pipeline, as researchers use it to predict mechanisms of action, assess toxicity, perform hit calling, and more. However, current image-based profiling workflows rely on aggregate statistics, such as calculating mean or median feature values per well, implicitly assuming cell homogeneity. This limitation obscures subpopulation effects, reducing sensitivity to subtle or heterogeneous effects of perturbations. Here we present Buscar, a method that leverages the full heterogeneity of single-cell image-based profiles to call hits. Buscar requires two reference, single-cell populations that define distinct morphology states: a reference state (e.g., disease cells) and a target state (e.g., healthy cells). Buscar then compares these two groups to define on- and off-morphology signatures, which it then uses to score every perturbation in a given screen. The scores quantify perturbation efficacy and off-target effects, or specificity, in an interpretable manner, clarifying which morphologies are appropriately altered and which may arise from off-target activity. We apply Buscar to three datasets. First, as a proof of concept, we applied Buscar to a Cell Painting dataset of cardiac fibroblasts from patients with heart failure. Buscar quantifies both morphology rescue and off-target morphology activity in these cells treated with a TGF{beta} receptor inhibitor. Second, we show that Buscar recovers biologically coherent gene-phenotype associations across 16 manually-labeled phenotypes in the MitoCheck dataset. Lastly, applied to CPJUMP1, we show that Buscar is robust to technical replicates collected across plates in both small-molecule and CRISPR-Cas9 perturbations. Together, these results establish Buscar as a reproducible and interpretable hit calling method that overcomes aggregation bias, enabling the simultaneous quantification of compound efficacy and specificity to enhance hit calling in HCS. We release Buscar as an open-source python package.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.4%
18.5%
2
Nature Methods
336 papers in training set
Top 0.4%
17.3%
3
Bioinformatics
1061 papers in training set
Top 2%
12.2%
4
Nature Communications
4913 papers in training set
Top 33%
4.8%
50% of probability mass above
5
Nucleic Acids Research
1128 papers in training set
Top 5%
4.1%
6
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
7
Cell Reports Methods
141 papers in training set
Top 0.9%
3.6%
8
Nature Biotechnology
147 papers in training set
Top 3%
3.2%
9
Genome Medicine
154 papers in training set
Top 3%
2.3%
10
Genome Biology
555 papers in training set
Top 3%
2.1%
11
Molecular Systems Biology
142 papers in training set
Top 0.5%
1.9%
12
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
13
PLOS ONE
4510 papers in training set
Top 59%
1.3%
14
Scientific Reports
3102 papers in training set
Top 64%
1.3%
15
iScience
1063 papers in training set
Top 22%
1.2%
16
eLife
5422 papers in training set
Top 50%
1.1%
17
Nature Biomedical Engineering
42 papers in training set
Top 1%
0.9%
18
Science
429 papers in training set
Top 18%
0.9%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
20
Briefings in Bioinformatics
326 papers in training set
Top 5%
0.9%
21
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
22
Cell Reports Medicine
140 papers in training set
Top 7%
0.9%
23
Cell Genomics
162 papers in training set
Top 5%
0.9%
24
Communications Biology
886 papers in training set
Top 25%
0.7%
25
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
26
Patterns
70 papers in training set
Top 3%
0.7%
27
Advanced Science
249 papers in training set
Top 22%
0.6%