Back

OpenChart-SE: A corpus of artificial Swedish electronic health records for imagined emergency care patients written by physicians in a crowd-sourcing project

Berg, J.; Aasa, C. O.; Appelgren Thorell, B.; Aits, S.

2023-01-05 health informatics
10.1101/2023.01.03.23284160 medRxiv
Show abstract

Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se). Additional batches of quality-controlled EHRs will be released periodically.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 7%
9.9%
2
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
9.9%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.3%
4
Bioinformatics
1061 papers in training set
Top 4%
6.2%
5
JAMIA Open
37 papers in training set
Top 0.3%
4.1%
6
Data in Brief
13 papers in training set
Top 0.1%
3.9%
7
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.6%
8
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.9%
3.5%
9
Scientific Data
174 papers in training set
Top 0.5%
3.5%
50% of probability mass above
10
npj Digital Medicine
97 papers in training set
Top 1%
3.5%
11
International Journal of Medical Informatics
25 papers in training set
Top 0.5%
3.0%
12
Frontiers in Digital Health
20 papers in training set
Top 0.3%
3.0%
13
PLOS ONE
4510 papers in training set
Top 49%
2.0%
14
Artificial Intelligence in Medicine
15 papers in training set
Top 0.3%
1.9%
15
Nature Communications
4913 papers in training set
Top 49%
1.9%
16
iScience
1063 papers in training set
Top 13%
1.8%
17
JMIR Medical Informatics
17 papers in training set
Top 0.8%
1.7%
18
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
19
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
20
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.5%
21
Patterns
70 papers in training set
Top 1%
1.3%
22
Database
51 papers in training set
Top 0.6%
1.2%
23
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.8%
0.9%
24
European Journal of Epidemiology
40 papers in training set
Top 0.6%
0.9%
25
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.9%
26
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
27
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.8%
0.8%
28
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.7%
29
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
30
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.7%