Optimising supervised machine learning algorithms predicting cigarette cravings and lapses for a smoking cessation just-in-time adaptive intervention (JITAI)
Leppin, C.; Brown, J.; Garnett, C.; Kale, D.; Okpako, T.; Simons, D.; Perski, O.
Show abstract
This study aimed to optimise the balance between participant burden and algorithm performance for predicting high-risk moments in a smoking cessation just-in-time adaptive intervention (JITAI) by systematically varying ecological momentary assessment (EMA) prompt frequency, predictor count, and training data source. Thirty-seven participants completed 16 EMAs per day for the first 10 days of their smoking cessation attempt, reporting mood, context, behaviour, cravings, and smoking lapses. Random forest algorithms predicting lapses and cravings were evaluated in terms of F1-score and ROC-AUC via mixed effects models accounting for clustering within individuals Performance across out-of-sample individuals ranged from excellent to poor but was, on average, modest. Lapse prediction outperformed craving prediction, particularly for ROC-AUC (Median F1-score: Lapses 0.436 [IQR 0.180-0.625], Cravings 0.400 [IQR 0.048-0.649]; Median ROC-AUC: Lapses 0.659 [IQR 0.514-0.809], Cravings 0.628 [IQR: 0.510-0.729]). A substantial proportion of configurations fell below commonly used minimum performance thresholds, particularly for F1-score. Reducing EMA frequency had outcome- and metric-dependent effects. Lapse F1-scores improved with fewer prompts (16 EMAs: 0.254 [IQR 0.081-0.500], 3 EMAs: 0.588 [IQR 0.353-0.667]), while ROC-AUC showed a slight, inconsistent decline (16 EMAs: 0.661 [IQR 0.520-876], 4 EMAs: 0.613 [IQR 0.494-0.786], 3 EMAs: 0.704 [IQR 0.567-0.809]). For cravings, both metrics declined with fewer prompts (F1-score: 16 EMAs: 0.470 [IQR 0.141-0.745]; 3 EMAs: 0.333 [IQR 0.000-0.600]; ROC-AUC: 16 EMAs 0.700 [IQR 0.582-0.811], 3 EMAs 0.544 [IQR 0.421-0.676]). Feature reduction had negligible impact on lapse prediction (F1-score: all features 0.435, selected features 0.441; ROC-AUC: all 0.660, selected 0.657), but slightly reduced craving performance (F1-score: all 0.410 [IQR 0.117-0.646], selected 0.400 [IQR 0.000-0.650]; ROC-AUC: all 0.632, selected 0.622). Including participant-specific data improved lapse F1-scores (None 0.286 [IQR 0.000-0.571], 30pc 0.542 [IQR: 0.329-0.667]), but did not ROC-AUC (None 0.655 [IQR: 0.512-0.786], 30pc 0.694 [IQR 0.513-0.852]); and impaired craving ROC-AUC (None 0.650 [IQR: 0.544-0.734], 30pc 0.614 [IQR 0.493-0.730]; F1-score: None 0.424 [IQR 0.143-0.649], 30pc 0.400 [IQR 0.000-0.703]). Overall, EMA-based machine learning detected lapse risk but showed modest overall performance and substantial inter-individual variability. Using higher EMA density, larger predictor sets, and participant-specific training data did not consistently outperform over more parsimonious approaches. However, machine learning prediction alone is unlikely to be sufficient for real-world JITAI implementation, and may be best combined with complementary rules-based approaches.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.