Back

Decomposing growth in a national HL7 CDA clinical document repository

Talvik, H.-A.; Laur, S.; Vilo, J.; Reisberg, S.

2026-05-26 health informatics
10.64898/2026.05.24.26353991 medRxiv
Show abstract

Longitudinal evaluations of national electronic health record repositories often track document counts alone, obscuring changes in content size, structure and standards implementation. We decomposed growth in the Estonian Health Information System across document counts, per-document size, section-level structure and version uptake in a 10% random population sample of 4.97 million HL7 Clinical Document Architecture Release 2 documents from 147,819 patients, spanning 2012--2019 and four prespecified document types. Growth patterns differed by document type. Inpatient summaries increased 48.5% in total content volume despite a 2.4% decline in document counts. Section presence and within-section content were highly skewed; 44.6% of 892 data locations carried one fixed value. Code-system diversity increased from 45 to 79, and version uptake took years: inpatient summaries reached 80% organisational uptake after a median 44 months (95% CI 11--78). This decomposition can guide extraction pipelines, secondary use and standards governance in CDA- and FHIR-based repositories.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.2%
22.3%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
18.5%
3
Journal of Medical Internet Research
85 papers in training set
Top 0.8%
6.3%
4
Nature Communications
4913 papers in training set
Top 33%
4.8%
50% of probability mass above
5
The Lancet Digital Health
25 papers in training set
Top 0.1%
4.8%
6
Scientific Reports
3102 papers in training set
Top 29%
4.1%
7
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 1%
3.6%
8
Frontiers in Digital Health
20 papers in training set
Top 0.4%
2.3%
9
PLOS ONE
4510 papers in training set
Top 48%
2.1%
10
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
1.9%
11
JAMIA Open
37 papers in training set
Top 0.8%
1.7%
12
European Respiratory Journal
54 papers in training set
Top 1.0%
1.7%
13
PLOS Digital Health
91 papers in training set
Top 2%
1.6%
14
JMIR Medical Informatics
17 papers in training set
Top 0.9%
1.5%
15
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.7%
0.9%
16
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.9%
17
Nature Human Behaviour
85 papers in training set
Top 4%
0.8%
18
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.8%
19
GigaScience
172 papers in training set
Top 3%
0.8%
20
BMJ
49 papers in training set
Top 1%
0.8%
21
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%
22
eLife
5422 papers in training set
Top 58%
0.7%
23
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
24
iScience
1063 papers in training set
Top 35%
0.7%
25
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.7%
26
Scientific Data
174 papers in training set
Top 3%
0.6%
27
DIGITAL HEALTH
12 papers in training set
Top 0.8%
0.6%