Back

Claw4Science: A Dataset and Platform for the OpenClaw Scientific Agent Ecosystem

Xu, M.; Chen, J.; Zhang, Z.

2026-04-01 bioinformatics
10.64898/2026.03.30.715118 bioRxiv
Show abstract

Large language models have enabled a new class of scientific software in the form of AI agents that can execute research workflows across bioinformatics, drug discovery, and related domains. Among these systems, OpenClaw introduced a skill-based design that allows workflows to be expressed as structured Markdown files, lowering the barrier to contribution and enabling rapid ecosystem growth. However, this growth has led to fragmentation. Projects are distributed across independent repositories, skills vary widely in quality, naming is inconsistent, and there is no unified way to discover or compare available tools. In this work, we construct the first curated dataset of the OpenClaw scientific ecosystem. The dataset includes 91 projects organized by functional role and 2,230 skills spanning 34 scientific categories. Based on this dataset, we perform a systematic analysis of the structure, distribution, and emerging patterns of scientific agent development. To make this ecosystem accessible in practice, we further build Claw4Science, a public platform at https://claw4science.org, which is built on top of our dataset. The platform organizes projects and aggregates distributed skill repositories into a unified interface, with a focus on bioinformatics and scientific workflows, providing a practical entry point for navigating the ecosystem. Our results show that the OpenClaw ecosystem reflects a shift from isolated systems to a more modular and shareable model of scientific computation. At the same time, challenges in evaluation, reproducibility, and governance remain open. We argue that our dataset provides a foundation for future benchmark development and standardized infrastructure for scientific AI agents.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.0%
2
GigaScience
172 papers in training set
Top 0.1%
14.0%
3
BMC Bioinformatics
383 papers in training set
Top 1%
8.2%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.7%
50% of probability mass above
5
Patterns
70 papers in training set
Top 0.3%
3.5%
6
Bioinformatics Advances
184 papers in training set
Top 1%
3.5%
7
PLOS ONE
4510 papers in training set
Top 43%
2.8%
8
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
2.0%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
10
Nucleic Acids Research
1128 papers in training set
Top 9%
2.0%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
12
Nature Communications
4913 papers in training set
Top 52%
1.7%
13
Journal of Molecular Biology
217 papers in training set
Top 2%
1.7%
14
Cell Systems
167 papers in training set
Top 8%
1.6%
15
Scientific Reports
3102 papers in training set
Top 63%
1.5%
16
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
17
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.3%
18
Frontiers in Genetics
197 papers in training set
Top 6%
1.3%
19
Genome Biology
555 papers in training set
Top 6%
1.2%
20
iScience
1063 papers in training set
Top 22%
1.2%
21
PeerJ
261 papers in training set
Top 12%
0.9%
22
Nature Methods
336 papers in training set
Top 7%
0.6%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%
24
Genome Research
409 papers in training set
Top 5%
0.6%
25
eLife
5422 papers in training set
Top 62%
0.6%
26
Database
51 papers in training set
Top 1%
0.6%