Back

Assessing the Quality of Electronic Health Record Data and the Claims Linked Data for Target Trial Emulation Studies

Lee, Y. A.; Lu, Y.; Morris, E. J.; He, X.; Winterstein, A. G.; Henriksen, C.; Bian, J.; Guo, J.

2025-12-29 health informatics
10.64898/2025.12.22.25342844
Show abstract

ObjectivesTo evaluate whether a EHR cohort, alone and linked to Medicare claims, has sufficient data quality to support design elements required for target trial emulation, using type 2 diabetes (T2D) as a case example. Materials and MethodsWe constructed annual University of Florida Health EHR-Medicare linked cohorts of patients [≥] 65 years with T2D from 2013 to 2020. Using Medicare claims as the reference, we assessed EHR data quality for target trial emulation-relevant elements across completeness, accuracy, plausibility, and concordance, spanning target trial components (eligibility, exposure/new-user ascertainment, baseline covariates, outcomes, and follow-up). Data quality was compared across EHR-only, claims-only, and EHR-claims linked data. ResultsThe mean annual EHR-Medicare linked cohort included 12,895 patients (mean age 74.9 years; 58.0% female). Demographics were complete and highly accurate. In the EHR-only cohort, completeness ranged 34.1-78.4% for conditions and 53.7-63.4% for glucose lowering drugs (GLDs). Accuracy was high for prevalent conditions and GLD use but low for incident measures. Plausible values were common (>98.5%), and HbA1c - T2D concordance was strong (98.6%). Linking EHR and claims substantially improved completeness and accuracy, especially for encounters, mortality, incident diagnoses, and medications. DiscussionThe linked dataset addressed major limitations of EHR-only data and provided enhanced granularity compared to claims alone, offering a comprehensive resource for real-world target trial emulation research. ConclusionEHRs offer valuable clinical details but face data quality challenges. Robust quality assurance strategies and linkage with external data are essential to strengthen real-world evidence and support target trial emulation. Lay SummaryWe evaluated whether a University of Florida Health electronic health record (EHR) cohort--alone and when linked to Medicare claims--has sufficient data quality to support "target trial emulation," a common approach for using real-world data to study medication effects when randomized trials are not feasible. We studied adults aged 65 years and older with type 2 diabetes from 2013-2020 and assessed four practical dimensions of data quality: completeness (how often key information is captured), accuracy (agreement with Medicare for billing-derived elements), plausibility (whether recorded values are clinically reasonable), and concordance (internal consistency between related EHR elements). Demographic fields were highly complete and accurate, and most lab and vital sign values were biologically plausible, supporting the reliability of core EHR clinical measurements. However, the EHR alone missed a substantial share of encounters, deaths, incident diagnoses, and medication initiation events that appeared in Medicare, reflecting care received outside a single health system. Linking EHR with Medicare substantially improved capture of these cross-setting events while preserving EHR-only clinical details (e.g., HbA1c and BMI), yielding a more robust dataset for real-world target trial emulation research.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
based on 53 papers
Top 0.1%
22.8%
2
JAMIA Open
based on 35 papers
Top 0.1%
14.2%
3
BMJ Open
based on 553 papers
Top 20%
5.3%
4
PLOS ONE
based on 1737 papers
Top 68%
5.1%
5
BMC Medical Research Methodology
based on 41 papers
Top 0.8%
3.4%
50% of probability mass above
6
BMC Medical Informatics and Decision Making
based on 36 papers
Top 3%
3.4%
7
npj Digital Medicine
based on 85 papers
Top 5%
3.4%
8
BMJ
based on 49 papers
Top 1%
3.2%
9
JAMA Network Open
based on 125 papers
Top 6%
2.9%
10
Clinical and Translational Science
based on 14 papers
Top 0.4%
2.8%
11
JMIR Public Health and Surveillance
based on 45 papers
Top 4%
2.0%
12
Journal of Biomedical Informatics
based on 37 papers
Top 3%
1.8%
13
PLOS Digital Health
based on 88 papers
Top 7%
1.8%
14
The Lancet Digital Health
based on 25 papers
Top 2%
1.5%
15
BMJ Health & Care Informatics
based on 13 papers
Top 2%
1.5%
16
Frontiers in Digital Health
based on 18 papers
Top 2%
1.5%
17
Scientific Reports
based on 701 papers
Top 73%
1.5%
18
Nature Communications
based on 483 papers
Top 34%
1.4%
19
Journal of General Internal Medicine
based on 19 papers
Top 3%
0.9%
20
Journal of Medical Internet Research
based on 81 papers
Top 12%
0.9%
21
Journal of Clinical Epidemiology
based on 29 papers
Top 2%
0.9%
22
Journal of the American College of Cardiology
based on 11 papers
Top 2%
0.9%
23
DIGITAL HEALTH
based on 11 papers
Top 1%
0.9%
24
International Journal of Medical Informatics
based on 25 papers
Top 5%
0.9%
25
JMIR Medical Informatics
based on 16 papers
Top 5%
0.5%
26
Pharmacoepidemiology and Drug Safety
based on 12 papers
Top 2%
0.5%
27
Patterns
based on 15 papers
Top 4%
0.5%