Back

Data Resource Profile: EST-Health-30

Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.

2026-04-24 epidemiology
10.64898/2026.04.21.26351087 medRxiv
Show abstract

Background: The increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. Methods: EST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026. Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. Results: The dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. Conclusions: EST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
International Journal of Epidemiology
74 papers in training set
Top 0.1%
23.0%
2
PLOS ONE
4510 papers in training set
Top 30%
5.0%
3
Scientific Data
174 papers in training set
Top 0.3%
5.0%
4
npj Digital Medicine
97 papers in training set
Top 1%
4.0%
5
Nature Communications
4913 papers in training set
Top 36%
4.0%
6
BMJ Open
554 papers in training set
Top 5%
3.7%
7
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.1%
3.7%
8
The Lancet Digital Health
25 papers in training set
Top 0.1%
3.7%
50% of probability mass above
9
BMC Medical Research Methodology
43 papers in training set
Top 0.4%
2.7%
10
Scientific Reports
3102 papers in training set
Top 46%
2.5%
11
European Journal of Epidemiology
40 papers in training set
Top 0.2%
2.1%
12
American Journal of Epidemiology
57 papers in training set
Top 0.5%
2.1%
13
Database
51 papers in training set
Top 0.3%
1.9%
14
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.8%
15
Eurosurveillance
80 papers in training set
Top 0.6%
1.7%
16
JAMIA Open
37 papers in training set
Top 0.9%
1.5%
17
JMIR Medical Informatics
17 papers in training set
Top 0.8%
1.5%
18
Patterns
70 papers in training set
Top 1%
1.5%
19
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.4%
20
Swiss Medical Weekly
12 papers in training set
Top 0.1%
1.4%
21
Wellcome Open Research
57 papers in training set
Top 1%
1.2%
22
Nature Medicine
117 papers in training set
Top 3%
1.2%
23
Nature Human Behaviour
85 papers in training set
Top 3%
1.2%
24
BMC Medicine
163 papers in training set
Top 5%
1.2%
25
Frontiers in Public Health
140 papers in training set
Top 8%
0.8%
26
Influenza and Other Respiratory Viruses
44 papers in training set
Top 0.4%
0.8%
27
Science Advances
1098 papers in training set
Top 30%
0.7%
28
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
29
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
30
Thorax
32 papers in training set
Top 0.9%
0.7%