Back

Learning from Synthetic Dataset for Crop Seed Instance Segmentation

Toda, Y.; Okura, F.; Ito, J.; Okada, S.; Kinoshita, T.; Tsuji, H.; Saisho, D.

2019-12-06 bioinformatics
10.1101/866921 bioRxiv
Show abstract

Incorporating deep learning in the image analysis pipeline has opened the possibility of introducing precision phenotyping in the field of agriculture. However, to train the neural network, a sufficient amount of training data must be prepared, which requires a time-consuming manual data annotation process that often becomes the limiting step. Here, we show that an instance segmentation neural network (Mask R-CNN) aimed to phenotype the barley seed morphology of various cultivars, can be sufficiently trained purely by a synthetically generated dataset. Our attempt is based on the concept of domain randomization, where a large amount of image is generated by randomly orienting the seed object to a virtual canvas. After training with such a dataset, performance based on recall and the average Precision of the real-world test dataset achieved 96% and 95%, respectively. Applying our pipeline enables extraction of morphological parameters at a large scale, enabling precise characterization of the natural variation of barley from a multivariate perspective. Importantly, we show that our approach is effective not only for barley seeds but also for various crops including rice, lettuce, oat, and wheat, and thus supporting the fact that the performance benefits of this technique is generic. We propose that constructing and utilizing such synthetic data can be a powerful method to alleviate human labor costs needed to prepare the training dataset for deep learning in the agricultural domain.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Plant Phenomics
17 papers in training set
Top 0.1%
22.5%
2
Scientific Reports
3102 papers in training set
Top 5%
10.4%
3
Frontiers in Plant Science
240 papers in training set
Top 1%
6.8%
4
Plant Physiology
217 papers in training set
Top 0.7%
6.3%
5
PLOS ONE
4510 papers in training set
Top 32%
4.8%
50% of probability mass above
6
New Phytologist
309 papers in training set
Top 1%
4.3%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
8
Plant Methods
39 papers in training set
Top 0.2%
2.9%
9
PLOS Computational Biology
1633 papers in training set
Top 13%
2.4%
10
Nature Communications
4913 papers in training set
Top 51%
1.7%
11
Biomedical Optics Express
84 papers in training set
Top 0.7%
1.7%
12
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
13
Bioinformatics
1061 papers in training set
Top 8%
1.5%
14
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
15
iScience
1063 papers in training set
Top 22%
1.2%
16
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 2%
1.2%
17
GigaScience
172 papers in training set
Top 2%
1.1%
18
Horticulture Research
43 papers in training set
Top 1%
0.9%
19
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
20
Communications Biology
886 papers in training set
Top 19%
0.9%
21
Advanced Science
249 papers in training set
Top 16%
0.9%
22
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
23
Ecological Informatics
29 papers in training set
Top 0.8%
0.7%
24
in silico Plants
24 papers in training set
Top 0.3%
0.7%
25
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
26
Journal of Vision
92 papers in training set
Top 0.5%
0.6%