Back

Decoupling Accuracy and Explainability: Machine Learning Strategies for HbA1c Prediction and Biomarker Discovery in Blood FTIR Spectroscopy

Melnychenko, M.; Makhnii, T.; Midlovets, K.; Dmyterchuk, B.; Krasnienkov, D.

2026-01-28 health informatics
10.64898/2026.01.26.26344831 medRxiv
Show abstract

Glycated hemoglobin (HbA1c) is a central biomarker for long-term glycemic control and diabetes management, traditionally quantified using laboratory-intensive chromatographic or immunochemical assays. As the global burden of diabetes continues to rise, there is growing interest in alternative, scalable approaches capable of rapid biochemical assessment. Fourier-transform infrared (FTIR) spectroscopy offers a reagent-free method that captures molecular signatures of protein glycation, but translating complex spectra into clinically interpretable HbA1c values requires robust analytical frameworks. Here, we present a complementary multi-model strategy for predicting HbA1c from FTIR spectra of whole blood. Using 685 blood samples with matched reference HbA1c measurements, we evaluated three analytically distinct yet synergistic approaches: partial least squares regression (PLSR), peak-resolved curve fitting based on pseudo-Voigt functions combined with H2O AutoML, and a convolutional neural network (CNN). PLSR and CNN models were trained on biologically informative spectral regions (800-1800 cm-{superscript 1} and 2800-3400 cm-{superscript 1}), while curve fitting focused on the fingerprint region (1000-1720 cm-{superscript 1}) to extract interpretable biochemical parameters. PLSR achieved the highest predictive accuracy (R{superscript 2} = 0.76), closely followed by the CNN (R{superscript 2} = 0.73), reflecting their ability to capture global linear and nonlinear spectral relationships. Although curve fitting yielded lower predictive performance (R{superscript 2} = 0.59), its peak-level decomposition enabled mechanistic interpretation of glycation-related changes. Explainable AI analysis using SHAP identified lipid- and protein-associated vibrations, carbohydrate-linked glycation bands, and amide-region structural features as key contributors to HbA1c prediction. Rather than treating these approaches as competing alternatives, our results demonstrate that their integration provides a more informative framework than any single model alone. By combining predictive performance with biochemical interpretability, this multi-model FTIR strategy highlights a scalable and mechanistically grounded pathway toward non-invasive HbA1c assessment and broader metabolic screening in diabetes monitoring. The code for this study is freely available at https://github.com/MelnychenkoM/ftir-hba1c-prediction.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 2%
15.0%
2
Neurophotonics
37 papers in training set
Top 0.1%
10.9%
3
Analytical Chemistry
205 papers in training set
Top 0.2%
10.5%
4
eBioMedicine
130 papers in training set
Top 0.1%
6.6%
5
Nature Communications
4913 papers in training set
Top 32%
5.1%
6
Advanced Science
249 papers in training set
Top 3%
5.1%
50% of probability mass above
7
PLOS ONE
4510 papers in training set
Top 47%
2.2%
8
Advanced Biology
29 papers in training set
Top 0.3%
1.8%
9
PNAS Nexus
147 papers in training set
Top 0.2%
1.7%
10
ACS Sensors
45 papers in training set
Top 0.7%
1.7%
11
Bioinformatics
1061 papers in training set
Top 7%
1.6%
12
Communications Biology
886 papers in training set
Top 12%
1.4%
13
Metabolites
50 papers in training set
Top 0.6%
1.4%
14
Frontiers in Microbiology
375 papers in training set
Top 6%
1.3%
15
iScience
1063 papers in training set
Top 23%
1.2%
16
Clinical and Translational Science
21 papers in training set
Top 0.7%
0.9%
17
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
18
Frontiers in Immunology
586 papers in training set
Top 6%
0.9%
19
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
20
Journal of Biomedical Optics
25 papers in training set
Top 0.6%
0.8%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
22
Analytical and Bioanalytical Chemistry
17 papers in training set
Top 0.4%
0.8%
23
eLife
5422 papers in training set
Top 56%
0.8%
24
EMBO Molecular Medicine
85 papers in training set
Top 6%
0.5%
25
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.5%
26
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.5%
27
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.5%