Back

Lack of Consensus for Manual Mouse Sleep Scoring Limits Implementation of Automatic Deep Learning Models

Rose, L.; Zahid, A. N.; Ciudad, J. G.; Egebjerg, C.; Piilgaard, L.; Soerensen, F. L.; Andersen, M.; Radovanovic, T.; Tsopanidou, A.; Nedergaard, M.; Arthaud, S.; Maciel, R.; Peyron, C.; Berteotti, C.; Martiere, V. L.; Silvani, A.; Zoccoli, G.; Borsa, M.; Adamantidis, A.; Moerup, M.; Kornum, B. R.

2026-03-30 neuroscience
10.64898/2026.03.27.714381 bioRxiv
Show abstract

Scientists have for decades attempted to automate the manual sleep staging problem not only for human polysomnography data but also for rodent data. No model has, however, succeeded in fully replacing the manual procedure across clinics and laboratories. We hypothesize that this is due to the models limited ability to generalize to data from unseen laboratories. Our findings show that despite the high performance of four state-of-the-art models reported in initial publications, the published models struggle to generalize to other laboratories. We further show a significant improvement in model performance across labs by re-training them on a diverse dataset from five different sites. To assess the contribution of variability in manual scoring, ten experts from five laboratories all labelled the same nine mouse sleep recordings. The result revealed substantial scoring variability, particularly for rapid eye movement (REM) sleep, both within and between labs. In conclusion our study demonstrates that key challenges in the generalizability of state-of-the-art sleep scoring models are signal variability and label noise. Our study highlights the need for a standardized set of mouse sleep scoring guidelines to enable consistency and collaboration across the field. Until such a consensus is reached, we present four sufficiently robust models trained on diverse datasets that can serve as standardized tools across labs.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 15%
12.6%
2
Scientific Reports
3102 papers in training set
Top 9%
8.5%
3
PLOS Computational Biology
1633 papers in training set
Top 6%
6.4%
4
Frontiers in Neuroscience
223 papers in training set
Top 0.7%
4.9%
5
Sleep Medicine
18 papers in training set
Top 0.1%
4.3%
6
npj Parkinson's Disease
89 papers in training set
Top 0.5%
4.0%
7
NeuroImage
813 papers in training set
Top 2%
4.0%
8
Frontiers in Neuroinformatics
38 papers in training set
Top 0.1%
3.6%
9
Communications Biology
886 papers in training set
Top 3%
2.8%
50% of probability mass above
10
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.8%
11
Frontiers in Physiology
93 papers in training set
Top 2%
2.6%
12
eneuro
389 papers in training set
Top 4%
2.1%
13
SLEEP
28 papers in training set
Top 0.2%
2.1%
14
Sensors
39 papers in training set
Top 0.8%
1.9%
15
Frontiers in Psychiatry
83 papers in training set
Top 2%
1.8%
16
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.7%
17
Journal of Neural Engineering
197 papers in training set
Top 1%
1.7%
18
Human Brain Mapping
295 papers in training set
Top 3%
1.7%
19
Neuroinformatics
40 papers in training set
Top 0.6%
1.5%
20
JMIR mHealth and uHealth
10 papers in training set
Top 0.2%
1.3%
21
Sleep
26 papers in training set
Top 0.4%
1.2%
22
eLife
5422 papers in training set
Top 49%
1.2%
23
iScience
1063 papers in training set
Top 21%
1.2%
24
Brain Communications
147 papers in training set
Top 2%
1.0%
25
Translational Psychiatry
219 papers in training set
Top 3%
1.0%
26
Wellcome Open Research
57 papers in training set
Top 2%
0.9%
27
Journal of Sleep Research
31 papers in training set
Top 0.4%
0.8%
28
PLOS Biology
408 papers in training set
Top 19%
0.8%
29
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
30
European Journal of Neuroscience
168 papers in training set
Top 2%
0.6%