Generalizable Cysteine Quantification in Pea Cultivars from SERS Spectra Using AI
Gorgannejad, E.; Liu, Q.; Findlay, C.; Nadimi, M.; Chun-Te Ko, A.; Bhowmik, P.; Paliwal, J.
Show abstract
Rapid quantification of sulfur-containing amino acids, particularly cysteine, in legumes is critical for assessing nutritional quality, supporting breeding program screening, and ensuring consistency in quality control processes. However, conventional methods, such as high-performance liquid chromatography (HPLC), are time-consuming and resource-intensive for high-throughput applications. This study evaluated artificial intelligence models for predicting cysteine concentration from surface-enhanced Raman spectroscopy (SERS) spectra of pea extracts. SERS spectra were acquired from 20 cultivars grown at three geographically distinct locations, with HPLC-measured cysteine concentrations as a ground truth reference. Linear regression, partial least squares regression, support vector regression, random forest regression, and a one-dimensional convolutional neural network (1D-CNN) were compared using within-cultivar splits and leave-one-cultivar-out (LOCO) evaluation. The 1D-CNN achieved RMSE 0.008 g/100 g within cultivars and maintained performance under LOCO, while other models showed limited generalization. Shapley Additive Explanations highlighted informative bands in the 630-760 cm-1 range, and noise modeling optimized scan-count selection.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.