Back

Distributional Data Analysis Uncovers Hundreds of Novel and Heritable Phenomic Features from Temporal Cotton and Maize Drone Imagery

DeSalvio, A. J.; Matabuena, M.; Adak, A.; Arik, M. A.; DeSalvio, S. M.; Murray, S. C.; Wong, R. K. W.; Edwards, J.; de Leon, N.; Kaeppler, S. M.; Lima, D. C.; Hirsch, C. N.; Thompson, A.; Stelly, D. M.

2025-09-07 plant biology
10.1101/2025.09.05.674557 bioRxiv
Show abstract

Genomic and phenomic analyses suggest additional heritable phenomic features can improve modeling of important end traits like senescence or yield. Field phenotyping generally uses trait values averaged across individual experimental units (plants or numerous plants within plots), ignoring the full distributional pattern of collected measures. Images of plants or plots, as captured by drones (unoccupied aerial vehicles / UAVs / drones), can be viewed as individual distribution functions that capture biological information. This study introduces and validates distributional data analysis in two crops and experiment types - cotton (Gossypium hirsutum L.) single plant vegetation index (VI) analysis and maize (Zea mays L.) plot-level yield predictions. In both crops, the concept of within-day variance decomposition was demonstrated. In cotton, genotypes exerted significant influences on temporal quantile functions of VIs. Maize yield prediction using distributional data with elastic-net regression indicated improvements in yield prediction between 12.7%-21.6% with quantiles outside the conventionally used median responsible for added predictive power. A novel data visualization method for per-pixel heritability allowed distributional features to be explainable and interpretable. These results have implications for future plant phenomic studies, indicating that distributional data analysis applied across temporal imagery captures novel, heritable, and interpretable biological signal that is lost when working with conventional measures of central tendency such as mean or median summary values of experimental units. SignificanceRepeated aerial imaging of agricultural experiments produces image data sets that capture plant development in high spatial and temporal resolutions. Frequently, images are summarized by measures of central tendency, such as mean or median values. Here, functional data distributional methods were applied to cotton (Gossypium hirsutum L.) and maize (Zea mays L.) image data, capturing more information than standard approaches. Cotton genotypes significantly impacted distributional spectral data while in maize, distributional data enabled more accurate predictions of grain yield versus models trained with median data alone. Distributional data were more explainable by genetics, with novel data visualization techniques able to shine light on specific parts of plant imagery with high and low genetic variance.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
The Plant Phenome Journal
14 papers in training set
Top 0.1%
22.2%
2
New Phytologist
309 papers in training set
Top 0.2%
18.4%
3
Plant Phenomics
17 papers in training set
Top 0.1%
10.0%
50% of probability mass above
4
The Plant Journal
197 papers in training set
Top 0.7%
6.7%
5
Frontiers in Plant Science
240 papers in training set
Top 2%
4.8%
6
Plant Physiology
217 papers in training set
Top 1%
4.2%
7
Plant Direct
81 papers in training set
Top 0.6%
3.5%
8
PLANTS, PEOPLE, PLANET
21 papers in training set
Top 0.2%
3.5%
9
Applications in Plant Sciences
21 papers in training set
Top 0.1%
3.0%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 28%
2.0%
11
Plant Methods
39 papers in training set
Top 0.3%
2.0%
12
PLOS ONE
4510 papers in training set
Top 49%
2.0%
13
Nature Communications
4913 papers in training set
Top 52%
1.7%
14
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
15
Scientific Reports
3102 papers in training set
Top 62%
1.5%
16
eLife
5422 papers in training set
Top 45%
1.5%
17
Remote Sensing in Ecology and Conservation
10 papers in training set
Top 0.2%
1.3%
18
in silico Plants
24 papers in training set
Top 0.2%
0.9%
19
Phytopathology®
28 papers in training set
Top 0.5%
0.9%
20
GigaScience
172 papers in training set
Top 3%
0.9%
21
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.9%
22
Global Change Biology
69 papers in training set
Top 2%
0.6%