Back

Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

Leipzig, J.; Bakis, Y.; Wang, X.; Elhamod, M.; Diamond, K.; Dahdul, W.; Karpatne, A.; Maga, M.; Mabee, P.; Bart, H. L.; Greenberg, J.

2021-01-29 evolutionary biology
10.1101/2021.01.28.428644 bioRxiv
Show abstract

Biodiversity image repositories are crucial sources of training data for machine learning approaches to biological research. Metadata, specifically metadata about object quality, is putatively an important prerequisite to selecting sample subsets for these experiments. This study demonstrates the importance of image quality metadata to a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset of low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found the visibility of all anatomical features was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support future machine learning projects.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Ecological Informatics
29 papers in training set
Top 0.1%
33.0%
2
PLOS ONE
4510 papers in training set
Top 11%
17.5%
50% of probability mass above
3
Biology Methods and Protocols
53 papers in training set
Top 0.2%
4.0%
4
Scientific Reports
3102 papers in training set
Top 37%
3.6%
5
Applications in Plant Sciences
21 papers in training set
Top 0.1%
2.9%
6
Methods in Ecology and Evolution
160 papers in training set
Top 1%
2.6%
7
Scientific Data
174 papers in training set
Top 0.8%
2.1%
8
Systematic Entomology
11 papers in training set
Top 0.1%
1.8%
9
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
10
Ecology and Evolution
232 papers in training set
Top 2%
1.8%
11
GigaScience
172 papers in training set
Top 1%
1.7%
12
Developmental Biology
134 papers in training set
Top 1%
1.7%
13
Molecular Ecology Resources
161 papers in training set
Top 0.7%
1.5%
14
Biology Open
130 papers in training set
Top 2%
1.2%
15
PeerJ
261 papers in training set
Top 10%
1.2%
16
Royal Society Open Science
193 papers in training set
Top 3%
1.2%
17
BMC Ecology and Evolution
49 papers in training set
Top 1%
0.9%
18
Frontiers in Ecology and Evolution
60 papers in training set
Top 3%
0.9%
19
New Phytologist
309 papers in training set
Top 4%
0.9%
20
Peer Community Journal
254 papers in training set
Top 3%
0.9%
21
Data in Brief
13 papers in training set
Top 0.3%
0.9%
22
BMC Biology
248 papers in training set
Top 3%
0.9%
23
Frontiers in Marine Science
55 papers in training set
Top 1%
0.8%
24
Frontiers in Plant Science
240 papers in training set
Top 5%
0.7%
25
Systematic Biology
121 papers in training set
Top 0.4%
0.7%
26
Malaria Journal
48 papers in training set
Top 2%
0.7%
27
Sensors
39 papers in training set
Top 2%
0.7%