Back

Quantifying the impact of sample, instrument, and data processing on biological signatures detected with Raman spectroscopy

Wiemann, J.; Heck, P. R.

2023-06-05 evolutionary biology
10.1101/2023.06.01.543279 bioRxiv
Show abstract

Raman spectroscopy is a popular tool for characterizing complex biological materials and their geological remains1-10. Ordination methods, such as Principal Component Analysis (PCA), rely on spectral variance to create a compositional space1, the ChemoSpace, grouping samples based on spectroscopic manifestations that reflect different biological properties or geological processes1-7. PCA allows to reduce the dimensionality of complex spectroscopic data and facilitates the extraction of relevant informative features into data formats suitable for downstream statistical analyses, thus representing an essential first step in the development of diagnostic biosignatures. However, there is presently no systematic survey of the impact of sample, instrument, and spectral processing on the occupation of the ChemoSpace. Here the influence of sample count, signal-to-noise ratios, spectrometer decalibration, baseline subtraction routines, and spectral normalization on ChemoSpace grouping is investigated using synthetic spectra. Increase in sample size improves the dissociation of sample groups in the ChemoSpace, however, a stable pattern in occupation can be achieved with less than 10 samples per group. Systemic noise of different amplitude and frequency, features that can be introduced by instrument or sample11,12, are eliminated by PCA even when spectra of differing signal-to-noise ratios are compared. Routine offsets ({+/-} 1 cm-1) in spectrometer calibration contribute to less than 0.1% of the total spectral variance captured in the ChemoSpace, and do not obscure biological information. Standard adaptive baselining, together with normalization, increase spectral comparability and facilitate the extraction of informative features. The ChemoSpace approach to biosignatures represents a powerful tool for exploring, denoising, and integrating molecular biological information from modern and ancient organismal samples.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.7%
18.9%
2
Bioinformatics
1061 papers in training set
Top 2%
12.6%
3
The Analyst
15 papers in training set
Top 0.1%
9.3%
4
Data in Brief
13 papers in training set
Top 0.1%
8.5%
5
PLOS ONE
4510 papers in training set
Top 43%
2.9%
50% of probability mass above
6
Advanced Biology
29 papers in training set
Top 0.2%
1.9%
7
Cancers
200 papers in training set
Top 2%
1.9%
8
Analytical Chemistry
205 papers in training set
Top 1%
1.8%
9
iScience
1063 papers in training set
Top 14%
1.7%
10
Communications Chemistry
39 papers in training set
Top 0.2%
1.7%
11
Scientific Data
174 papers in training set
Top 1%
1.7%
12
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 4%
1.7%
13
Frontiers in Plant Science
240 papers in training set
Top 4%
1.5%
14
Communications Biology
886 papers in training set
Top 12%
1.4%
15
FEBS Open Bio
29 papers in training set
Top 0.3%
1.1%
16
Journal of Biomedical Optics
25 papers in training set
Top 0.5%
1.0%
17
New Phytologist
309 papers in training set
Top 4%
0.9%
18
Protein Science
221 papers in training set
Top 2%
0.8%
19
Journal of Microscopy
18 papers in training set
Top 0.4%
0.8%
20
Medical Research Archives
11 papers in training set
Top 0.5%
0.8%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
22
ISME Communications
103 papers in training set
Top 2%
0.8%
23
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
24
Journal of Biophotonics
16 papers in training set
Top 0.7%
0.8%
25
BMC Ecology and Evolution
49 papers in training set
Top 2%
0.8%
26
Optica
25 papers in training set
Top 0.8%
0.7%
27
Open Biology
95 papers in training set
Top 2%
0.7%
28
Nano Letters
63 papers in training set
Top 3%
0.7%
29
Journal of Visualized Experiments
30 papers in training set
Top 0.9%
0.7%
30
Journal of Medicinal Chemistry
68 papers in training set
Top 1%
0.7%