Data Resource Profile: EST-Health-30
Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.
Show abstract
Background: The increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. Methods: EST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026. Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. Results: The dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. Conclusions: EST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.