Enhancing Prediabetes Diagnosis from Continuous Glucose Monitoring Data via Iterative Label Cleaning and Deep Learning
Arethiya, N. J.; Krammer, L.; David, J.; Bakshi, V.; BasuChoudhary, A.; Bhuiyan, U.; Sen, S.; Mazumder, R.; McNeely, P.
Show abstract
As of early 2026, over 115 million US adults (more than 1 in 3) have prediabetes, a condition with an annual conversion rate of 5%-10% to type 2 diabetes. Total diabetes (diagnosed and undiagnosed) affects approximately 40.1 million Americans, or 12% of the population, with roughly 1.5 million new cases diagnosed annually. Continuous Glucose Monitoring (CGM) provides real-time, 24/7 insights into glycemic variability, detecting dangerous highs, lows, and trends that HbA1c (a 3-month average) misses. It enables, for instance, identification of nocturnal hypoglycemia or postprandial spikes, enhancing personalized, actionable treatment decisions and improving safety. The Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights (AI-READI) dataset was produced by the National Institutes of Health (NIH) Common Fund Data Ecosystem (CFDE) Bridge2AI program. This dataset offers a rich resource for diabetes research, providing comprehensive biosensor data from over 1,067 participants. However, like many medical datasets, AI-READI contains label inaccuracies due to self-reported health surveys and static HbA1c indicators, which can undermine model effectiveness. We developed a strong classification framework using Convolutional-Bidirectional Long Short-Term Memory (Conv+BiLSTM) to analyze and accurately classify glycemic health states from continuous glucose monitoring time-series data. Our aim was to establish and correct any misclassified labels through hybrid unsupervised-supervised learning methods and validated our results with expert-in-the-loop clinical review. We analyzed 784 participants from the AI-READI dataset, which represented four health states: healthy, prediabetes lifestyle controlled, oral medication, and insulin-dependent. Based on recommendations from the literature and our own expertise, we sought to compare the self-provided "healthy" group labels with a cluster-agnostic, CGM-defined healthy (CGM-H) reference derived from the CGM metrics using K-means clustering (K=6) on standardized CGM summary features to identify CGM-H participants and then applied XGBoost-based iterative label refinement. We identified a misclassification rate of 56.9% (161/283) in the initially labeled "healthy" group. After eight iterations of XGBoost refinement with dual-criterion relabeling ([≥]80% probability + unanimous out-of-fold voting), the cleaned dataset increased CGM-H participants from 122 to 195 for binary classification. Next, we developed a Conv+BiLSTM model combining Convolutional layers (32, 64 filters) for local temporal feature extraction with Bidirectional LSTM layers (64, 32 units) for sequence modeling, using time-series engineered features including rolling statistics, glucose derivatives, and circadian rhythm encoding. Class imbalance was addressed with per-class weighting, and 5-fold stratified cross-validation estimated generalization performance, computing a global decision threshold (0.374) by maximizing Youdens J statistic on concatenated out-of-fold predictions. Additionally, we analyzed heart rate, activity level, and stress and sleep data and validated it against CGM data. The Conv+BiLSTM model achieved ROC-AUC {approx} 0.932 on the held-out test set and 0.907 {+/-} 0.026 in cross-validation, with well-calibrated predictions (Expected Calibration Error = 0.075, temperature scaling T = 1.00). A 3-tier confidence-based decision system achieved 82% detection rate with only 6% OGTT burden, enabling actionable clinical recommendations. This hybrid approach addressed label noise while achieving high discrimination. This framework demonstrates potential for real-time glycemic state monitoring and early intervention in diabetes progression.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.