Back

Increasing Phenomic Prediction Efficiency Using A Principal Component Analysis Based Pre-Processing Of Near Infrared Spectra

Bienvenu, C.; Roger, J.-M.; Sene, M.; Castro Pacheco, S. A.; Singer, M.; Felaniaina, B. L.; Terrier, N.; De Bellis, F.; Pot, D.; DE VERDAL, H.; Segura, V.

2026-05-13 genetics

10.64898/2026.05.10.724118 bioRxiv

Show abstract

Phenomic prediction (PP) is a breeding value prediction method using near infrared spectroscopy (NIRS). Spectra pre-processing is a key step in the analysis pipeline of PP and generally involves chemometrics methods. However, there is still little understanding in the genetics community of what pre-processing does and why it increases performances. Consequently, the choice of pre-processing is done either arbitrarily or through a search of the optimal set of methods and associated parameters. In this study, we propose a PCA-based pre-processing method where genetic values of spectra are estimated on a set of principal components instead of individual wavelengths. This way, estimations are based on a few informative and orthogonal features of spectra instead of many correlated, uninformative wavelengths. We tested this new pre-processing method on five data sets representing four plant species (maize, rice, sorghum and grapevine). Results show that it performs as good, or better than the best classical chemometric pre-processing methods in almost all cases. Combining PCA-based and classical chemometric pre-processing methods maximizes predictive ability. Moreover, this pre-processing method opens up possibilities of better understanding and selecting parts of the spectral information that are relevant for the prediction of breeding values. Indeed, components representing together about 1% of spectral variability were found to be responsible for most of PP predictive ability. Plain language summaryCultivated plants are the result of a breeding process during which their genetic values are used to select those to breed. Estimation of breeding values requires heavy experimental means and is time consuming. Phenomic prediction is a low cost and high throughput genetic value estimation method that is increasingly being used. It often uses near infrared spectroscopy measurements as predictors of genetic values that are easy to collect and thus routinely used in many species. However, near infrared spectra generally require pre-processing before being used in prediction. Currently used pre-processing methods arise from the chemometrics community, and still deserve a better in-depth appropriation by geneticists. In this study, we propose a new pre-processing approach that performs as good as or better than the best chemometric pre-processing generally used, reduces computation time, and allows for a better understanding of what parts of spectral information are relevant for prediction. Core IdeasO_LIWorking on principal components of spectra instead of wavelengths increases predictive ability of phenomic prediction and performs as good as or better than classical chemometrics pre-processing C_LIO_LIWorking on principal components of spectra requires less optimization of parameters than chemometrics pre-processing C_LIO_LIAbout 1% of spectral variance is responsible for most of the predictive power of phenomic prediction C_LIO_LIWorking on principal components of spectra pre-processed with classical chemometrics pre-processing can increase predictive ability even more C_LIO_LIPCA-based methods are valuable to optimize predictive ability of phenomic prediction and could be used more widely in the quantitative genetics field C_LI

Increasing Phenomic Prediction Efficiency Using A Principal Component Analysis Based Pre-Processing Of Near Infrared Spectra

Matching journals