Back

SleePyPhases: A Workflow Framework for Sleep Data Harmonization, Analysis and Machine Learning

Ehrlich, F.; Bäcker, S.; Schmidt, M.; Malberg, H.; Sedlmayr, M.; Goldammer, M.

2026-01-16 health informatics
10.64898/2026.01.14.26344163
Show abstract

Data-driven sleep research relies on polysomnography data from various public repositories and vendor systems. Yet the lack of standardized access methods creates substantial barriers to multi-dataset research, method reuse, and reproducibility. We present SleePyPhases, an open-source Python framework providing unified access to multiple sleep data repositories. It offers integrated data harmonization, configuration-based preprocessing, and the development of machine learning pipelines. The framework unifies channel naming, annotation semantics, and data formats across several public repositories (including SHHS, MESA, MrOS, PhysioNet, and SleepEDF) and commercial vendor formats (Philips Alice and Somnomedics Domino). We validated the framework by reproducing five published sleep analysis studies covering diverse datasets, sleep scoring tasks (sleep staging, arousal, leg movement, respiratory event detection), preprocessing methods (signal preprocessing and spectrograms), machine learning methods (supervised and unsupervised learning), and model architectures (convolutional, recurrent, and transformer networks). Four reproductions achieved near-identical results, confirming data fidelity and methodological flexibility. SleePyPhases is open-source and provides a foundation for reproducible sleep research, enabling researchers to focus on scientific questions rather than data infrastructure.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
based on 701 papers
Top 7%
13.4%
2
Journal of Sleep Research
based on 14 papers
Top 0.2%
10.3%
3
Journal of Medical Internet Research
based on 81 papers
Top 3%
5.4%
4
JAMIA Open
based on 35 papers
Top 2%
5.4%
5
PLOS ONE
based on 1737 papers
Top 70%
4.7%
6
npj Digital Medicine
based on 85 papers
Top 5%
4.5%
7
Frontiers in Psychiatry
based on 56 papers
Top 3%
3.0%
8
Journal of the American Medical Informatics Association
based on 53 papers
Top 3%
2.8%
9
PLOS Digital Health
based on 88 papers
Top 5%
2.8%
50% of probability mass above
10
Patterns
based on 15 papers
Top 0.5%
2.5%
11
Computers in Biology and Medicine
based on 39 papers
Top 3%
2.5%
12
Nature Medicine
based on 88 papers
Top 5%
2.3%
13
Scientific Data
based on 30 papers
Top 1%
2.3%
14
SLEEP
based on 11 papers
Top 0.4%
2.3%
15
Nature Communications
based on 483 papers
Top 28%
1.9%
16
IEEE Journal of Biomedical and Health Informatics
based on 14 papers
Top 1%
1.8%
17
Frontiers in Digital Health
based on 18 papers
Top 2%
1.6%
18
Frontiers in Neuroscience
based on 29 papers
Top 2%
1.6%
19
International Journal of Medical Informatics
based on 25 papers
Top 4%
1.3%
20
Journal of Biomedical Informatics
based on 37 papers
Top 4%
1.3%
21
European Heart Journal - Digital Health
based on 15 papers
Top 2%
1.3%
22
JMIR Medical Informatics
based on 16 papers
Top 5%
0.8%
23
Translational Psychiatry
based on 94 papers
Top 8%
0.8%
24
BMC Medical Informatics and Decision Making
based on 36 papers
Top 7%
0.8%
25
JMIR Formative Research
based on 31 papers
Top 7%
0.7%
26
Journal of Neural Engineering
based on 19 papers
Top 2%
0.7%
27
Frontiers in Physiology
based on 18 papers
Top 4%
0.7%