Back

Systematic Survey of Public Datasets for Behavioral Research in Invertebrate Models: Toward FAIR and Standardized Data Sharing

Piorkowska, N. J.; Mazurek, R.; Adamek, D.; Lopianiak, M.; Kubs, M.; Kulka, K.; Luszczek, P.; Mijal, R.

2025-12-15 animal behavior and cognition
10.64898/2025.12.12.693879 bioRxiv
Show abstract

Behavioral datasets for invertebrate model organisms are rapidly expanding alongside automated imaging, tracking, and artificial intelligence (AI) based phenotyping, yet their technical structure and compliance with the Findable, Accessible, Interoperable and Reusable (FAIR) principles remain heterogeneous. We present a two-stage survey of openly available behavioural datasets for major invertebrate models Caenorhabditis elegans (C. elegans), Drosophila melanogaster (D. melanogaster), Galleria mellonella (G. mellonella), and planarians Schmidtea mediterranea (S. mediterranea) with larval zebrafish (Danio rerio) included as a vertebrate comparator. Stage 1 comprised a PRISMA-guided literature review (from 2015 to 2025) across indexed databases and complementary non-indexed sources, yielding 12 eligible publications describing 12 open behavioural datasets. Stage 2 independently screened and technically evaluated repository deposits (from June 2022 to July 2025), producing a final corpus of 20 datasets scored on a four-dimension ordinal rubric capturing usability, annotation richness, technical quality and AI-readiness. All extracted descriptors, repository search logs, and scoring sheets are released as public data records enabling full regeneration of figures and summary statistics. Across Stage 2 deposits, multimodality and open file formats were common, whereas interoperability and AI-readiness were most constrained by limited machine-readable metadata, weak raw-to-derived provenance, and sparse adoption of formal standards or ontologies. This Data Descriptor provides a reproducible, dataset-centred overview of behavioural resources for invertebrate models and practical guidance for FAIR-aligned publication, secondary biological analyses, and AI benchmarking.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Scientific Data
174 papers in training set
Top 0.1%
53.1%
50% of probability mass above
2
PLOS ONE
4510 papers in training set
Top 26%
6.5%
3
eLife
5422 papers in training set
Top 13%
6.4%
4
BMC Biology
248 papers in training set
Top 0.2%
3.7%
5
Methods in Ecology and Evolution
160 papers in training set
Top 1.0%
2.9%
6
GigaScience
172 papers in training set
Top 0.9%
2.1%
7
Scientific Reports
3102 papers in training set
Top 49%
2.1%
8
PLOS Biology
408 papers in training set
Top 9%
1.7%
9
SoftwareX
15 papers in training set
Top 0.1%
1.7%
10
Ecological Informatics
29 papers in training set
Top 0.4%
1.7%
11
iScience
1063 papers in training set
Top 19%
1.4%
12
Nature Communications
4913 papers in training set
Top 56%
1.3%
13
Royal Society Open Science
193 papers in training set
Top 3%
1.1%
14
Nature Methods
336 papers in training set
Top 5%
1.0%
15
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
16
Behavior Research Methods
25 papers in training set
Top 0.2%
0.8%
17
Nature
575 papers in training set
Top 15%
0.8%
18
eneuro
389 papers in training set
Top 9%
0.8%
19
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
20
Neuropsychopharmacology
134 papers in training set
Top 3%
0.7%
21
Nature Neuroscience
216 papers in training set
Top 7%
0.7%
22
Biology Open
130 papers in training set
Top 3%
0.7%
23
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
24
Biological Imaging
15 papers in training set
Top 0.3%
0.7%