Back

FishWIO: a labeled image dataset of Western Indian Ocean reef fishes for training and testing classification algorithms

Fleure, V.; Villeger, S.; Claverie, T.

2026-03-04 ecology
10.64898/2026.03.03.709272 bioRxiv
Show abstract

Monitoring fish communities is essential for understanding biodiversity dynamics and coral reef ecosystem health. Underwater imaging provides a non-invasive and repeatable approach for such monitoring, yet analysis of large volumes of video data remains extremely time-consuming for experts. Resolving such a bottleneck is today within reach, yet towards automated fish identification, large and high-quality, labelled image datasets are critical for training and testing reliable deep learning models. However, to date, no such dataset exists for the Western Indian Ocean (WIO), a global biodiversity hotspot hosting more than 300 common non-cryptobenthic fish species and facing increasing anthropogenic pressures. This paper presents a novel and publicly available dataset of 114,664 images annotated from 186 videos recorded using fixed underwater cameras on shallow reef habitats from Mayotte archipelago. All images were labelled and validated by trained marine biologists following a standardized protocol. Each image includes detailed metadata describing recording conditions. The dataset comprises 124 reef fish species (including 110 with >200 images) and 8 background classes. This dataset will allow training and testing automated fish classification models.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 4%
27.3%
2
Scientific Data
174 papers in training set
Top 0.1%
23.8%
50% of probability mass above
3
Ecological Informatics
29 papers in training set
Top 0.1%
11.0%
4
Scientific Reports
3102 papers in training set
Top 5%
10.7%
5
Sensors
39 papers in training set
Top 0.3%
4.4%
6
Remote Sensing in Ecology and Conservation
10 papers in training set
Top 0.1%
2.2%
7
Limnology and Oceanography: Methods
11 papers in training set
Top 0.2%
1.8%
8
Communications Biology
886 papers in training set
Top 15%
1.2%
9
Ecological Indicators
20 papers in training set
Top 0.4%
0.9%
10
PLOS Computational Biology
1633 papers in training set
Top 21%
0.9%
11
Royal Society Open Science
193 papers in training set
Top 4%
0.8%
12
Applied Sciences
24 papers in training set
Top 0.8%
0.8%
13
Communications Earth & Environment
14 papers in training set
Top 0.8%
0.8%
14
Animals
20 papers in training set
Top 0.8%
0.8%
15
Aquatic Conservation: Marine and Freshwater Ecosystems
12 papers in training set
Top 0.3%
0.8%
16
Data in Brief
13 papers in training set
Top 0.4%
0.8%
17
Frontiers in Marine Science
55 papers in training set
Top 1%
0.8%
18
Nature Communications
4913 papers in training set
Top 62%
0.8%
19
Frontiers in Ecology and Evolution
60 papers in training set
Top 4%
0.5%
20
Methods in Ecology and Evolution
160 papers in training set
Top 3%
0.5%
21
eLife
5422 papers in training set
Top 62%
0.5%