Back

Cell-Level Virtual Screening

Ellington, C. N.; Addagudi, S.; Wang, J.; Lengerich, B. J.; Xing, E. P.

2026-05-13 bioinformatics
10.64898/2026.05.11.724149 bioRxiv
Show abstract

Virtual screening methods prioritize therapeutic candidates by predicting molecular properties and interactions. However, molecular models are insufficient to predict higher-order effects that arise in real biological systems, leading to late-stage failures in drug discovery. Virtual cells have been posed as a solution to this problem by predicting gene expression responses to drugs, but they remain weakly validated as screening tools; gene expression is only an intermediate in understanding drug success or failure. Despite burgeoning progress in virtual cells, some basic questions remain. Is expression even a good representation of higher-order drug effects? How can expression and other cell-level representations be applied to prioritize therapeutic candidates? Can cell-level methods be fairly compared against traditional molecular-level screens? We address these questions in a two-pronged approach. First, we curate two benchmarks, Drug-Disease Retrieval Bench (DDR-Bench) and Drug-Target Retrieval Bench (DTR-Bench), which directly compare cell-level methods against traditional molecular methods on canonical drug discovery tasks. DDR-Bench evaluates a methods ability to prioritize disease indications for drugs with novel target profiles. DTR-Bench evaluates a methods ability to reconstruct drug-target interactions from separate perturbation modalities that act on shared mechanisms, bridging the gap between cell-level methods and classic molecular screens. We identify shortcomings of existing screening methods on these benchmarks, and propose an alternative representation of drug effects: perturbed gene networks. Inferring post-perturbation gene networks on-demand for unseen drugs requires methods that generalize beyond traditional plug-in network estimators. We develop a scalable differentiable surrogate loss for multivariate Gaussians, which we apply to train a context-adaptive amortized estimator that maps perturbation metadata to gene-gene dependency network parameters. The resulting model, CellVS-Net, achieves SOTA on predicting how gene networks restructure under a variety of complex multivariate experimental conditions, including different cell types, small molecule therapeutics, signaling molecules, gene knockdowns, and gene over-expressions. When compared to other molecular and cell-level representations of drugs, we find that CellVS-Net achieves SOTA on both virtual screening benchmarks. Overall, CellVS-Net demonstrates that cell-level virtual screening methods are a viable alternative to molecular screening, and associated benchmarks enable hill-climbing on relevant drug discovery tasks.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.2%
23.3%
2
Bioinformatics
1061 papers in training set
Top 1%
19.2%
3
Nature Methods
336 papers in training set
Top 2%
6.5%
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.0%
50% of probability mass above
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.7%
6
Nature Machine Intelligence
61 papers in training set
Top 1%
3.2%
7
Nature Communications
4913 papers in training set
Top 43%
3.0%
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
2.7%
9
BMC Bioinformatics
383 papers in training set
Top 3%
2.4%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
11
Patterns
70 papers in training set
Top 0.5%
2.1%
12
PLOS ONE
4510 papers in training set
Top 49%
1.9%
13
Scientific Reports
3102 papers in training set
Top 52%
1.9%
14
Nature Biotechnology
147 papers in training set
Top 4%
1.7%
15
Genome Research
409 papers in training set
Top 3%
1.5%
16
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.3%
17
iScience
1063 papers in training set
Top 24%
1.0%
18
eLife
5422 papers in training set
Top 55%
0.8%
19
Genome Biology
555 papers in training set
Top 8%
0.7%
20
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
21
Cell Genomics
162 papers in training set
Top 7%
0.7%
22
Journal of Cheminformatics
25 papers in training set
Top 0.6%
0.7%
23
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.5%
24
npj Digital Medicine
97 papers in training set
Top 4%
0.5%
25
Cell Reports Methods
141 papers in training set
Top 7%
0.5%
26
Nature Computational Science
50 papers in training set
Top 2%
0.5%
27
BioData Mining
15 papers in training set
Top 1%
0.5%