Back

Evaluating open LLMs for agentic analysis orchestration in a typical biomedical lab

Nekrutenko, A.

2026-05-18 bioinformatics
10.64898/2026.05.13.724985 bioRxiv
Show abstract

Agentic tools -- software environments where a large language model plans, calls external tools, executes code, and iterates with minimal human intervention -- will run a substantial share of routine biomedical data analysis within the next few years. However, per-call inference cost on frontier models is the bottleneck and can add up quickly. Here, we tested whether a free, locally-runnable open-weight model could take over the repetitive execution steps at frontier accuracy. We used Claudes Opus to author plans of increasing detail for per-sample variant calling, and ran six 2026-release open-weight implementer LLMs against those plans on a set of desktop GPUs. qwen3.6:27b reproduced frontier accuracy on every plan and matched Opus cell-for-cell on a 36-cell error-injection matrix. A sub-$2,000 Jetson or Apple Mac Mini sufficed for the implementer side. The open-weight model landscape evolves on the order of months, so the specific implementer recommended here will be superseded; we provide the plans, harness, scoring code, and per-cell artifacts at https://github.com/nekrut/LLM-eval-paper as a framework for re-evaluating future models.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.3%
18.7%
2
GigaScience
172 papers in training set
Top 0.1%
7.2%
3
Genome Biology
555 papers in training set
Top 1%
6.3%
4
Cell Systems
167 papers in training set
Top 3%
4.3%
5
Genome Research
409 papers in training set
Top 0.8%
4.0%
6
Nature Communications
4913 papers in training set
Top 37%
4.0%
7
Genome Medicine
154 papers in training set
Top 2%
4.0%
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
50% of probability mass above
9
Nature Biotechnology
147 papers in training set
Top 3%
3.6%
10
Bioinformatics
1061 papers in training set
Top 5%
3.6%
11
Nature Machine Intelligence
61 papers in training set
Top 0.9%
3.6%
12
Patterns
70 papers in training set
Top 0.2%
3.6%
13
eLife
5422 papers in training set
Top 29%
3.1%
14
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
15
Cell Genomics
162 papers in training set
Top 3%
1.9%
16
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
17
BMC Bioinformatics
383 papers in training set
Top 5%
1.2%
18
Cell Reports Methods
141 papers in training set
Top 4%
1.1%
19
Nature
575 papers in training set
Top 14%
1.0%
20
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
1.0%
21
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
22
Molecular Systems Biology
142 papers in training set
Top 1%
0.8%
23
iScience
1063 papers in training set
Top 29%
0.8%
24
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
25
PLOS ONE
4510 papers in training set
Top 67%
0.7%
26
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
27
Nature Genetics
240 papers in training set
Top 8%
0.7%
28
Cell Reports Medicine
140 papers in training set
Top 9%
0.7%
29
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.9%
0.7%
30
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.6%