Back

Enhancing tree seed germination prediction with image-driven machine learning models.

Barreiro, P. G.; Richard-Bollans, A.; Mattana, E.; Seal, C. E.; Chapman, T.; Dayrell, R.

2025-01-24 plant biology
10.1101/2025.01.21.634084 bioRxiv
Show abstract

1. Tree planting is crucial for reversing deforestation and meeting net zero targets, requiring a reliable supply of high-quality seeds. Efficient use of limited native seeds can be promoted by sorting methods, but traditional techniques commonly used for agricultural species are often unsuitable for tree seeds due to their high trait variability. 2. Here, we explored the potential of combining image analysis with machine learning models to improve tree seed sorting outcomes. We selected five UK native tree species of interest for tree production and afforestation projects and applied machine learning XGBoost and Convolutional Neural Networks algorithms to predict seed germination using colour and X-ray images as well as features extracted from these images. 3. The machine learning models achieved good accuracy and F1-scores, but their specificity was limited, particularly when relying solely on colour images or related features. This poses a problem, as wild seeds are often scarce, and falsely classifying seeds that germinate as non-germinable would result in a waste of valuable resources. X-ray images and features were highly effective in identifying empty seeds but did not perform well when differentiating filled seeds into germinable and non-viable. Consequently, the models performed best for species with a high proportion of empty seeds. 4. For three of the five species, model performance varied significantly by mother tree, with some trees showing markedly poorer results. This aspect had not been previously investigated and raises concerns that biased seed sorting will disadvantage certain mother trees, leading to the loss of valuable genetic diversity and woodland resilience. Synthesis and applications: The performance of image-based machine learning models in predicting seed germination ultimately depended on whether most non-germinated seeds were empty, non-viable, or dormant. X-ray models showed strong performance in detecting empty seeds, but colour image models exhibited poor results due to the high variability in seed external features, the subtle differences between germinated and non-germinated seeds, and the variability among individual mother trees. Developing open, accessible training databases and more adaptable models is crucial for addressing these limitations and enable technologies to further support large-scale tree production.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLANTS, PEOPLE, PLANET
21 papers in training set
Top 0.1%
22.7%
2
New Phytologist
309 papers in training set
Top 0.5%
10.2%
3
PLOS ONE
4510 papers in training set
Top 20%
9.2%
4
The Plant Phenome Journal
14 papers in training set
Top 0.1%
8.5%
50% of probability mass above
5
Frontiers in Plant Science
240 papers in training set
Top 1%
6.9%
6
Methods in Ecology and Evolution
160 papers in training set
Top 0.5%
6.4%
7
Plant Phenomics
17 papers in training set
Top 0.1%
6.4%
8
Applications in Plant Sciences
21 papers in training set
Top 0.1%
3.6%
9
Scientific Reports
3102 papers in training set
Top 43%
2.9%
10
Remote Sensing in Ecology and Conservation
10 papers in training set
Top 0.1%
1.9%
11
Plant Methods
39 papers in training set
Top 0.5%
1.3%
12
Forest Ecology and Management
25 papers in training set
Top 0.3%
1.3%
13
Plant Direct
81 papers in training set
Top 1%
1.3%
14
Plant Physiology
217 papers in training set
Top 2%
1.2%
15
Crop Science
18 papers in training set
Top 0.3%
1.2%
16
Phytopathology®
28 papers in training set
Top 0.5%
1.0%
17
The Plant Journal
197 papers in training set
Top 3%
1.0%
18
eLife
5422 papers in training set
Top 53%
0.9%
19
Horticulture Research
43 papers in training set
Top 1%
0.8%
20
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
21
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%
22
in silico Plants
24 papers in training set
Top 0.4%
0.5%
23
BMC Genomics
328 papers in training set
Top 8%
0.5%