SleePyPhases: A Workflow Framework for Sleep Data Harmonization, Analysis and Machine Learning
Ehrlich, F.; Bäcker, S.; Schmidt, M.; Malberg, H.; Sedlmayr, M.; Goldammer, M.
Show abstract
Data-driven sleep research relies on polysomnography data from various public repositories and vendor systems. Yet the lack of standardized access methods creates substantial barriers to multi-dataset research, method reuse, and reproducibility. We present SleePyPhases, an open-source Python framework providing unified access to multiple sleep data repositories. It offers integrated data harmonization, configuration-based preprocessing, and the development of machine learning pipelines. The framework unifies channel naming, annotation semantics, and data formats across several public repositories (including SHHS, MESA, MrOS, PhysioNet, and SleepEDF) and commercial vendor formats (Philips Alice and Somnomedics Domino). We validated the framework by reproducing five published sleep analysis studies covering diverse datasets, sleep scoring tasks (sleep staging, arousal, leg movement, respiratory event detection), preprocessing methods (signal preprocessing and spectrograms), machine learning methods (supervised and unsupervised learning), and model architectures (convolutional, recurrent, and transformer networks). Four reproductions achieved near-identical results, confirming data fidelity and methodological flexibility. SleePyPhases is open-source and provides a foundation for reproducible sleep research, enabling researchers to focus on scientific questions rather than data infrastructure.
Matching journals
The top 9 journals account for 50% of the predicted probability mass.