Back

Are Current AI Virtual Cell Models Useful for Scientific Discovery?

Bereket, M. D.; Leskovec, J.

2026-04-25 bioinformatics
10.64898/2026.04.23.719015 bioRxiv
Show abstract

AI models are increasingly developed to predict the effect of perturbations on gene expression, but current benchmarks fail to reliably measure model performance. Here, we argue that new benchmarks that directly measure the value of model predictions for specific scientific discovery outcomes are needed to address this gap. We present PerturbHD, an evaluation framework for AI-enabled hit discovery, to demonstrate the benefits our proposed approach.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 3%
10.1%
2
BMC Bioinformatics
383 papers in training set
Top 1%
8.4%
3
Cell Systems
167 papers in training set
Top 2%
6.8%
4
Nucleic Acids Research
1128 papers in training set
Top 3%
6.3%
5
GigaScience
172 papers in training set
Top 0.3%
4.8%
6
Bioinformatics Advances
184 papers in training set
Top 0.7%
4.8%
7
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.3%
8
npj Systems Biology and Applications
99 papers in training set
Top 0.5%
3.6%
9
iScience
1063 papers in training set
Top 5%
3.6%
50% of probability mass above
10
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
11
PLOS ONE
4510 papers in training set
Top 42%
3.1%
12
Scientific Reports
3102 papers in training set
Top 41%
3.1%
13
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
2.3%
14
Genome Biology
555 papers in training set
Top 4%
2.1%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.9%
17
Nature Communications
4913 papers in training set
Top 52%
1.7%
18
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
19
Patterns
70 papers in training set
Top 0.9%
1.7%
20
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.5%
21
BioData Mining
15 papers in training set
Top 0.4%
1.5%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
23
Frontiers in Genetics
197 papers in training set
Top 7%
0.9%
24
Nature Methods
336 papers in training set
Top 6%
0.9%
25
Cell Genomics
162 papers in training set
Top 6%
0.9%
26
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
27
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
28
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%
29
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.6%
0.7%
30
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.6%