Back

Evaluating Limits of Machine Learning-Assisted Raman Spectroscopy in Classification of Biological Samples

Yadav, A.; Birkby, A.; Armstrong, N.; Arnob, A.; Chou, M.-H.; Fernandez, A.; Verhoef, A. J.; Yi, Z.; Gulati, S.; Kotnis, S.; Sun, Q.; Kao, K. C.; Wu, H.-J.

2026-03-01 bioinformatics

10.64898/2026.02.26.708284 bioRxiv

Show abstract

Machine learning (ML)-assisted Raman spectroscopy has become a powerful analytical tool for the classification and identification of analytes; however, technical challenges impacting its detection accuracy have not been investigated. This study explores experimental factors affecting classification performance. Among the evaluated ML models, ML algorithms show minimal impacts on classification accuracy. Instead, experimental factors, including spectral similarity between tested samples and the data quality, dominate detection performance. Increases in spectral noises and spectral similarity significantly reduce classification accuracy. In well-controlled samples with low experimental noise, ML-assisted Raman spectroscopy can discriminate lipid mixtures with a composition difference of 1.85 mol%. To assess the effect of biological heterogeneity, we analyzed single-cell Raman spectra from Saccharomyces cerevisiae strains carrying single, double, or triple gene mutations. Intrinsic cell-to-cell variability introduced substantial spectral differences, severely reducing the accuracy of multiclass classification of these genetically similar strains at the single-cell level. Averaging Raman spectra across multiple cells improved classification accuracy by reducing this spectral variability. We also assess the effectiveness of transfer learning across different Raman spectrometers, specifically by applying a ML model trained on one instrument to another Raman spectrometer. Transfer learning can be improved with proper instrument calibration, highlighting the importance of instrument standardization. Overall, our results demonstrate that data quality and spectral similarity are the primary bottlenecks in ML-assisted Raman spectroscopy. Careful attention to sample preparation, data acquisition, measurement conditions, and instrument calibration is critical to achieving robust and reliable classification performance.

Evaluating Limits of Machine Learning-Assisted Raman Spectroscopy in Classification of Biological Samples

Matching journals