Back
Are Current AI Virtual Cell Models Useful for Scientific Discovery?
Bereket, M. D.; Leskovec, J.
2026-04-25
bioinformatics
10.64898/2026.04.23.719015
bioRxiv
Show abstract
AI models are increasingly developed to predict the effect of perturbations on gene expression, but current benchmarks fail to reliably measure model performance. Here, we argue that new benchmarks that directly measure the value of model predictions for specific scientific discovery outcomes are needed to address this gap. We present PerturbHD, an evaluation framework for AI-enabled hit discovery, to demonstrate the benefits our proposed approach.
Matching journals
●Non-profit
◐University press
○Commercial
The top 9 journals account for 50% of the predicted probability mass.
1
Bioinformatics
◐
1061 papers in training set
Top 3%
10.1%
2
BMC Bioinformatics
○
383 papers in training set
Top 1%
8.4%
3
Cell Systems
○
167 papers in training set
Top 2%
6.8%
4
Nucleic Acids Research
◐
1128 papers in training set
Top 3%
6.3%
5
GigaScience
◐
172 papers in training set
Top 0.3%
4.8%
6
Bioinformatics Advances
◐
184 papers in training set
Top 0.7%
4.8%
7
Briefings in Bioinformatics
◐
326 papers in training set
Top 1%
4.3%
8
npj Systems Biology and Applications
○
99 papers in training set
Top 0.5%
3.6%
9
iScience
○
1063 papers in training set
Top 5%
3.6%
50% of probability mass above
10
PLOS Computational Biology
●
1633 papers in training set
Top 11%
3.1%
11
PLOS ONE
●
4510 papers in training set
Top 42%
3.1%
12
Scientific Reports
○
3102 papers in training set
Top 41%
3.1%
13
Genomics, Proteomics & Bioinformatics
◐
171 papers in training set
Top 2%
2.3%
14
Genome Biology
○
555 papers in training set
Top 4%
2.1%
15
Computational and Structural Biotechnology Journal
●
216 papers in training set
Top 3%
2.1%
16
NAR Genomics and Bioinformatics
◐
214 papers in training set
Top 2%
1.9%
17
Nature Communications
○
4913 papers in training set
Top 52%
1.7%
18
Nature Machine Intelligence
○
61 papers in training set
Top 2%
1.7%
19
Patterns
○
70 papers in training set
Top 0.9%
1.7%
20
IEEE Transactions on Computational Biology and Bioinformatics
●
17 papers in training set
Top 0.3%
1.5%
21
BioData Mining
○
15 papers in training set
Top 0.4%
1.5%
22
Proceedings of the National Academy of Sciences
●
2130 papers in training set
Top 38%
1.2%
23
Frontiers in Genetics
○
197 papers in training set
Top 7%
0.9%
24
Nature Methods
○
336 papers in training set
Top 6%
0.9%
25
Cell Genomics
○
162 papers in training set
Top 6%
0.9%
26
Journal of the American Medical Informatics Association
◐
61 papers in training set
Top 2%
0.9%
27
Cell Reports Methods
○
141 papers in training set
Top 5%
0.7%
28
Journal of Molecular Biology
○
217 papers in training set
Top 4%
0.7%
29
IEEE/ACM Transactions on Computational Biology and Bioinformatics
●
32 papers in training set
Top 0.6%
0.7%
30
IEEE Journal of Biomedical and Health Informatics
●
34 papers in training set
Top 2%
0.6%