Back

Agentic systems are adept at solving well-scoped, verifiable problems in computational biology

Nair, S.; Gunsalus, L.; Orcutt-Jahns, B.; Rossen, J.; Lal, A.; Donno, C. D.; Celik, M. H.; Fletez-Brant, K.; Xie, X.; Bravo, H. C.; Eraslan, G.

2026-04-09 bioinformatics
10.64898/2026.04.06.716850 bioRxiv
Show abstract

We introduce CompBioBench, a benchmark of 100 diverse tasks for evaluating agentic systems in computational biology. Unlike mathematics and programming, which more readily admit systematic verification, biological data are inherently noisy and open to interpretation. To enable objective evaluation without reducing tasks to prescriptive checklists, we propose a new benchmark construction strategy based on synthetic/augmented data and metadata scrambling/scrubbing of real datasets to create challenging problems with a single ground-truth answer that require multi-step reasoning, tool use, bespoke code, and interaction with real-world external resources. The benchmark spans genomics, transcriptomics, epigenomics, single-cell analysis, human genetics, and machine learning workflows. Questions are curated by domain experts to cover a broad range of skills with varying difficulty. We evaluate leading general-purpose agentic systems starting from a bare-minimum environment, requiring them to fetch data and tools as needed to solve each problem. We find strong end-to-end performance, with Codex CLI (GPT 5.4) reaching 83% accuracy and Claude Code (Opus 4.6) reaching 81%. On the hardest questions, Codex CLI (GPT 5.4) reaches 59%, while Claude Code (Opus 4.6) reaches 69%. CompBioBench provides a practical testbed for measuring the progress of agentic systems in computational biology and for guiding future benchmark design.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 1%
10.1%
2
Nature Methods
336 papers in training set
Top 1%
8.4%
3
Nature Communications
4913 papers in training set
Top 23%
8.2%
4
Bioinformatics
1061 papers in training set
Top 3%
7.2%
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.9%
6
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
7
Nucleic Acids Research
1128 papers in training set
Top 4%
4.3%
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
4.0%
50% of probability mass above
9
Genome Biology
555 papers in training set
Top 3%
3.3%
10
Nature Biotechnology
147 papers in training set
Top 3%
2.6%
11
Bioinformatics Advances
184 papers in training set
Top 2%
2.6%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 25%
2.6%
13
Nature
575 papers in training set
Top 9%
2.4%
14
Nature Machine Intelligence
61 papers in training set
Top 1%
2.1%
15
PLOS ONE
4510 papers in training set
Top 48%
2.1%
16
Scientific Reports
3102 papers in training set
Top 53%
1.9%
17
Genome Research
409 papers in training set
Top 2%
1.7%
18
Science
429 papers in training set
Top 16%
1.2%
19
Nature Genetics
240 papers in training set
Top 6%
0.9%
20
Patterns
70 papers in training set
Top 2%
0.9%
21
Development
440 papers in training set
Top 3%
0.9%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
23
Communications Biology
886 papers in training set
Top 19%
0.9%
24
ACS Synthetic Biology
256 papers in training set
Top 3%
0.8%
25
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.8%
0.8%
26
iScience
1063 papers in training set
Top 29%
0.8%
27
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
28
GigaScience
172 papers in training set
Top 3%
0.7%
29
eLife
5422 papers in training set
Top 59%
0.7%
30
Biophysical Journal
545 papers in training set
Top 6%
0.6%