Back

The Sleep-Wake Classification Performance of Pediatric-Trained Machine Learning Algorithms for Raw Accelerometer Data

Chen, P.-W.; Cielo, C.; Walsh, O.; Mcdonald, M.; Song, P. X.; Goldstein, C.; Moreno, J. P.; Jansen, E.; Mitchell, J. A.

2026-06-01 pediatrics
10.64898/2026.05.28.26354364 medRxiv
Show abstract

Introduction: Actigraphy sleep-wake classification methods increasingly seek to leverage raw acceleration data and machine-learning-based classification, but performance evaluation in pediatrics is limited. We trained machine-learning models using pediatric data and compared their sleep-wake classification performance with existing algorithms for children. Methods: Sixty-five children (46% female, ages 5.3 to 17.7 years) completed in-lab overnight polysomnography and wore a GENEActiv device on their non-dominant wrist. The acceleration data were converted into 30-second epochs and aligned with physician-scored sleep-wake data from electroencephalography. Seven machine-learning models were trained using leave-one-subject-out cross-validation. Epoch-by-epoch analyses generated performance metrics (e.g., balanced accuracy [BA]) and discrepancy analyses provided overall sleep duration bias estimates. The combination of highest performance and least bias was used to rank using Euclidean distance scores - where a lower score represents closer to perfect performance and zero bias. For benchmarking, we included GGIR sleep scoring algorithms and an adult trained random forest classifier. Results: Overall, 560.1 hours of polysomnography and actigraphy data were collected (74.4% of epochs were scored as sleep). The pediatric-trained local-global long-short term memory (LSTM) classifier had the most optimal epoch-by-epoch performance (e.g., BA=0.85, sensitivity=0.88, specificity=0.83, ROC-AUC=0.95, and Cohen kappa=0.67). These metrics exceeded that of an adult-trained random forest classifier and GGIR-based algorithms. Discrepancy analyses revealed that overall sleep duration was underestimated by an average of 25 minutes using the LSTM classifier with no proportional bias. Conclusion: We trained seven pediatric sleep-wake classifiers that had strong ability to detect sleep and wake, with the LSTM classifier being most optimal.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.1%
12.3%
2
Sleep Medicine
18 papers in training set
Top 0.1%
12.3%
3
SLEEP
28 papers in training set
Top 0.1%
10.1%
4
Scientific Reports
3102 papers in training set
Top 14%
6.8%
5
Journal of Sleep Research
31 papers in training set
Top 0.1%
6.8%
6
PLOS ONE
4510 papers in training set
Top 25%
6.8%
50% of probability mass above
7
Physiological Measurement
12 papers in training set
Top 0.1%
4.0%
8
Annals of Neurology
57 papers in training set
Top 0.5%
4.0%
9
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.6%
10
Sleep
26 papers in training set
Top 0.2%
3.6%
11
Ear & Hearing
15 papers in training set
Top 0.1%
1.9%
12
JMIR mHealth and uHealth
10 papers in training set
Top 0.2%
1.8%
13
Journal of Proteome Research
215 papers in training set
Top 1%
1.7%
14
PLOS Digital Health
91 papers in training set
Top 2%
1.3%
15
Frontiers in Neuroscience
223 papers in training set
Top 5%
1.3%
16
The Journal of Pediatrics
15 papers in training set
Top 0.6%
0.8%
17
BMC Cancer
52 papers in training set
Top 2%
0.8%
18
Neurology
44 papers in training set
Top 2%
0.7%
19
Epilepsy & Behavior
12 papers in training set
Top 0.4%
0.7%
20
BMC Medicine
163 papers in training set
Top 8%
0.6%
21
npj Digital Medicine
97 papers in training set
Top 4%
0.6%
22
European Journal of Neuroscience
168 papers in training set
Top 2%
0.6%
23
BioData Mining
15 papers in training set
Top 1%
0.6%