Evaluating open LLMs for agentic analysis orchestration in a typical biomedical lab

Nekrutenko, A.

2026-05-18 bioinformatics

10.64898/2026.05.13.724985 bioRxiv

Show abstract

Agentic tools -- software environments where a large language model plans, calls external tools, executes code, and iterates with minimal human intervention -- will run a substantial share of routine biomedical data analysis within the next few years. However, per-call inference cost on frontier models is the bottleneck and can add up quickly. Here, we tested whether a free, locally-runnable open-weight model could take over the repetitive execution steps at frontier accuracy. We used Claudes Opus to author plans of increasing detail for per-sample variant calling, and ran six 2026-release open-weight implementer LLMs against those plans on a set of desktop GPUs. qwen3.6:27b reproduced frontier accuracy on every plan and matched Opus cell-for-cell on a 36-cell error-injection matrix. A sub-$2,000 Jetson or Apple Mac Mini sufficed for the implementer side. The open-weight model landscape evolves on the order of months, so the specific implementer recommended here will be superseded; we provide the plans, harness, scoring code, and per-cell artifacts at https://github.com/nekrut/LLM-eval-paper as a framework for re-evaluating future models.

Matching journals

●Non-profit ◐University press ○Commercial

The top 8 journals account for 50% of the predicted probability mass.

Only show non-profit

○ 336 papers in training set

◐ 172 papers in training set

○ 555 papers in training set

○ 167 papers in training set

Genome Research

● 409 papers in training set

Nature Communications

○ 4913 papers in training set

Genome Medicine

○ 154 papers in training set

PLOS Computational Biology

● 1633 papers in training set

50% of probability mass above

Nature Biotechnology

○ 147 papers in training set

◐ 1061 papers in training set

Nature Machine Intelligence

○ 61 papers in training set

○ 70 papers in training set

● 5422 papers in training set

Bioinformatics Advances

◐ 184 papers in training set

○ 162 papers in training set

Nucleic Acids Research

◐ 1128 papers in training set

BMC Bioinformatics

○ 383 papers in training set

Cell Reports Methods

○ 141 papers in training set

○ 575 papers in training set

Journal of the American Medical Informatics Association

◐ 61 papers in training set

Briefings in Bioinformatics

◐ 326 papers in training set

Molecular Systems Biology

○ 142 papers in training set

○ 1063 papers in training set

Proceedings of the National Academy of Sciences

● 2130 papers in training set

● 4510 papers in training set

npj Digital Medicine

○ 97 papers in training set

Nature Genetics

○ 240 papers in training set

Cell Reports Medicine

○ 140 papers in training set

JCO Clinical Cancer Informatics

● 18 papers in training set

The American Journal of Human Genetics

○ 206 papers in training set