Back

Exploratory electronic health record analysis with ehrapy

Heumos, L.; Ehmele, P.; Treis, T.; Upmeier zu Belzen, J.; Namsaraeva, A.; Horlava, N.; Shitov, V. A.; Zhang, X.; Zappia, L.; Knoll, R.; Lang, N. J.; Hetzel, L.; Virshup, I.; Sikkema, L.; Roellin, E.; Curion, F.; Eils, R.; Schiller, H. B.; Hilgendorff, A.; Theis, F.

2023-12-11 health informatics
10.1101/2023.12.11.23299816 medRxiv
Show abstract

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here, we introduce ehrapy, a modular open-source Python framework designed for exploratory end-to-end analysis of heterogeneous epidemiology and electronic health record data. Ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference, and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models paving the way for foundational models in biomedical research. We demonstrated ehrapys features in five distinct examples: We first applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we revealed biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. Finally, we reconstructed disease state trajectories in SARS-CoV-2 patients based on imaging data. Ehrapy thus provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.2%
18.7%
2
Patterns
70 papers in training set
Top 0.1%
14.4%
3
Nature Communications
4913 papers in training set
Top 17%
10.2%
4
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
6.4%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
6.4%
50% of probability mass above
6
Nature Biomedical Engineering
42 papers in training set
Top 0.2%
4.3%
7
Nature Computational Science
50 papers in training set
Top 0.1%
4.0%
8
Advanced Science
249 papers in training set
Top 5%
3.6%
9
Scientific Reports
3102 papers in training set
Top 44%
2.7%
10
Science Translational Medicine
111 papers in training set
Top 2%
1.9%
11
Bioinformatics
1061 papers in training set
Top 7%
1.7%
12
Science Advances
1098 papers in training set
Top 18%
1.7%
13
JAMIA Open
37 papers in training set
Top 0.9%
1.5%
14
Med
38 papers in training set
Top 0.4%
1.2%
15
The Lancet Digital Health
25 papers in training set
Top 0.6%
1.2%
16
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
1.0%
17
eLife
5422 papers in training set
Top 53%
0.9%
18
JMIR Medical Informatics
17 papers in training set
Top 1%
0.8%
19
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
20
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
21
PLOS ONE
4510 papers in training set
Top 69%
0.7%
22
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
23
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%
24
Genome Medicine
154 papers in training set
Top 10%
0.5%
25
Communications Biology
886 papers in training set
Top 32%
0.5%
26
Communications Medicine
85 papers in training set
Top 2%
0.5%
27
eBioMedicine
130 papers in training set
Top 6%
0.5%
28
Nature Methods
336 papers in training set
Top 7%
0.5%