Back

Automated Sleep Stage and Event Detection Algorithms Using Quality-Controlled PSG Annotations

Kaneda, M.; Ogaki, S.; Nohara, T.; Fujita, S.; Osako, N.; Yagi, T.; Tomita, Y.; Ogata, T.

2025-12-17 neurology
10.64898/2025.12.15.25342334
Show abstract

Study ObjectivesTo develop machine-learning models for sleep stage classification, arousal detection, and respiratory event detection from overnight polysomnography, and to evaluate their performance relative to expert scorers. MethodsOvernight polysomnography recordings were obtained from healthy participants and participants referred for suspected sleep-disordered breathing. Four certified scorers completed calibration sessions and generated reference annotations for sleep stages, arousals, and respiratory events. A subset of recordings was independently annotated by all scorers to support consensus analyses, enabling direct comparison between model outputs and human inter-scorer agreement. Gradient-boosted decision tree models were trained using hand-crafted features derived from standard physiological signals. ResultsSleep stage classification achieved accuracy 0.840, Cohens kappa 0.791, and F1-score 0.841, with limits of agreement for total sleep time of approximately {+/-}0.5 h. Arousal detection achieved an F1-score of 0.733, with limits of agreement for the arousal index of approximately {+/-}15 events/h. Respiratory event detection achieved an F1-score of 0.818, with limits of agreement for the apnea-hypopnea index also within approximately {+/-}15 events/h. In consensus analyses, model performance was comparable to human inter-scorer agreement for sleep stages and arousals, while remaining below human inter-scorer agreement for respiratory events, despite high absolute performance relative to prior studies. ConclusionsThe proposed models achieved performance approaching human-level agreement across major sleep scoring tasks. These findings indicate that high consistency in expert annotations is a key factor underlying robust model performance and support the use of quality-controlled annotations for developing reliable automated sleep analysis systems. Statement of significanceManual scoring of overnight sleep studies remains a major bottleneck in sleep medicine, limiting efficiency, consistency, and large-scale research. This study demonstrates that interpretable automated analysis can achieve performance approaching human-level agreement for core sleep scoring tasks when reference annotations are highly consistent. By directly comparing model outputs with calibrated inter-scorer agreement, the results show that annotation quality is a key determinant of attainable accuracy, rather than model complexity alone. Such systems may provide stable and reproducible reference outputs that support clinical decision making, scorer training, and standardization across centers. Important remaining challenges include validation across institutions and populations, robustness to real-world signal artifacts, and extension to clinically meaningful subtypes of respiratory events.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Scientific Reports
based on 701 papers
Top 9%
12.5%
2
SLEEP
based on 11 papers
Top 0.1%
11.0%
3
npj Digital Medicine
based on 85 papers
Top 2%
10.1%
4
PLOS ONE
based on 1737 papers
Top 59%
7.5%
5
Frontiers in Neurology
based on 74 papers
Top 3%
6.3%
6
Journal of Sleep Research
based on 14 papers
Top 0.3%
5.0%
50% of probability mass above
7
npj Parkinson's Disease
based on 35 papers
Top 1%
2.9%
8
Sleep Medicine
based on 11 papers
Top 0.5%
2.9%
9
Journal of Neural Engineering
based on 19 papers
Top 1%
1.7%
10
Brain Communications
based on 79 papers
Top 5%
1.6%
11
Nature Communications
based on 483 papers
Top 31%
1.6%
12
Neurology
based on 38 papers
Top 5%
1.6%
13
eBioMedicine
based on 82 papers
Top 3%
1.6%
14
PLOS Digital Health
based on 88 papers
Top 10%
1.3%
15
Clinical Neurophysiology
based on 19 papers
Top 2%
1.3%
16
Annals of Neurology
based on 43 papers
Top 4%
1.3%
17
Movement Disorders
based on 49 papers
Top 3%
1.2%
18
Frontiers in Neuroscience
based on 29 papers
Top 3%
1.2%
19
Annals of Clinical and Translational Neurology
based on 22 papers
Top 3%
1.2%
20
Critical Care Explorations
based on 15 papers
Top 2%
0.8%
21
Epilepsia
based on 27 papers
Top 2%
0.8%
22
BMC Medicine
based on 155 papers
Top 22%
0.8%
23
Journal of Medical Internet Research
based on 81 papers
Top 16%
0.7%
24
Computers in Biology and Medicine
based on 39 papers
Top 8%
0.7%
25
Nature Medicine
based on 88 papers
Top 20%
0.7%