Back

Assessing the Quality of Electronic Health Record Data and the Claims Linked Data for Target Trial Emulation Studies

Lee, Y. A.; Lu, Y.; Morris, E. J.; He, X.; Winterstein, A. G.; Henriksen, C.; Bian, J.; Guo, J.

2025-12-29 health informatics
10.64898/2025.12.22.25342844 medRxiv
Show abstract

ObjectivesTo evaluate whether a EHR cohort, alone and linked to Medicare claims, has sufficient data quality to support design elements required for target trial emulation, using type 2 diabetes (T2D) as a case example. Materials and MethodsWe constructed annual University of Florida Health EHR-Medicare linked cohorts of patients [≥] 65 years with T2D from 2013 to 2020. Using Medicare claims as the reference, we assessed EHR data quality for target trial emulation-relevant elements across completeness, accuracy, plausibility, and concordance, spanning target trial components (eligibility, exposure/new-user ascertainment, baseline covariates, outcomes, and follow-up). Data quality was compared across EHR-only, claims-only, and EHR-claims linked data. ResultsThe mean annual EHR-Medicare linked cohort included 12,895 patients (mean age 74.9 years; 58.0% female). Demographics were complete and highly accurate. In the EHR-only cohort, completeness ranged 34.1-78.4% for conditions and 53.7-63.4% for glucose lowering drugs (GLDs). Accuracy was high for prevalent conditions and GLD use but low for incident measures. Plausible values were common (>98.5%), and HbA1c - T2D concordance was strong (98.6%). Linking EHR and claims substantially improved completeness and accuracy, especially for encounters, mortality, incident diagnoses, and medications. DiscussionThe linked dataset addressed major limitations of EHR-only data and provided enhanced granularity compared to claims alone, offering a comprehensive resource for real-world target trial emulation research. ConclusionEHRs offer valuable clinical details but face data quality challenges. Robust quality assurance strategies and linkage with external data are essential to strengthen real-world evidence and support target trial emulation. Lay SummaryWe evaluated whether a University of Florida Health electronic health record (EHR) cohort--alone and when linked to Medicare claims--has sufficient data quality to support "target trial emulation," a common approach for using real-world data to study medication effects when randomized trials are not feasible. We studied adults aged 65 years and older with type 2 diabetes from 2013-2020 and assessed four practical dimensions of data quality: completeness (how often key information is captured), accuracy (agreement with Medicare for billing-derived elements), plausibility (whether recorded values are clinically reasonable), and concordance (internal consistency between related EHR elements). Demographic fields were highly complete and accurate, and most lab and vital sign values were biologically plausible, supporting the reliability of core EHR clinical measurements. However, the EHR alone missed a substantial share of encounters, deaths, incident diagnoses, and medication initiation events that appeared in Medicare, reflecting care received outside a single health system. Linking EHR with Medicare substantially improved capture of these cross-setting events while preserving EHR-only clinical details (e.g., HbA1c and BMI), yielding a more robust dataset for real-world target trial emulation research.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
JAMIA Open
37 papers in training set
Top 0.1%
22.9%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
22.9%
3
BMJ Open
554 papers in training set
Top 2%
9.3%
50% of probability mass above
4
JMIR Public Health and Surveillance
45 papers in training set
Top 0.1%
6.9%
5
PLOS ONE
4510 papers in training set
Top 30%
4.9%
6
npj Digital Medicine
97 papers in training set
Top 1%
3.1%
7
BMC Medical Research Methodology
43 papers in training set
Top 0.5%
1.9%
8
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.7%
9
BMJ
49 papers in training set
Top 0.5%
1.7%
10
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
11
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.5%
12
Journal of General Internal Medicine
20 papers in training set
Top 0.5%
1.5%
13
PLOS Digital Health
91 papers in training set
Top 2%
1.4%
14
JMIR Medical Informatics
17 papers in training set
Top 1%
1.2%
15
Scientific Reports
3102 papers in training set
Top 68%
1.0%
16
The Lancet Digital Health
25 papers in training set
Top 1.0%
0.8%
17
DIGITAL HEALTH
12 papers in training set
Top 0.7%
0.8%
18
Clinical and Translational Science
21 papers in training set
Top 1%
0.8%
19
BMC Health Services Research
42 papers in training set
Top 2%
0.8%
20
BMJ Open Quality
15 papers in training set
Top 0.8%
0.8%
21
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.4%
0.8%
22
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.7%
23
BMJ Open Diabetes Research & Care
15 papers in training set
Top 1%
0.7%
24
Preventive Medicine Reports
14 papers in training set
Top 0.6%
0.5%
25
Journal of Clinical and Translational Science
11 papers in training set
Top 0.6%
0.5%
26
British Journal of General Practice
22 papers in training set
Top 0.7%
0.5%
27
Trials
25 papers in training set
Top 2%
0.5%
28
Pilot and Feasibility Studies
12 papers in training set
Top 0.8%
0.5%