Assessing the Quality of Electronic Health Record Data and the Claims Linked Data for Target Trial Emulation Studies
Lee, Y. A.; Lu, Y.; Morris, E. J.; He, X.; Winterstein, A. G.; Henriksen, C.; Bian, J.; Guo, J.
Show abstract
ObjectivesTo evaluate whether a EHR cohort, alone and linked to Medicare claims, has sufficient data quality to support design elements required for target trial emulation, using type 2 diabetes (T2D) as a case example. Materials and MethodsWe constructed annual University of Florida Health EHR-Medicare linked cohorts of patients [≥] 65 years with T2D from 2013 to 2020. Using Medicare claims as the reference, we assessed EHR data quality for target trial emulation-relevant elements across completeness, accuracy, plausibility, and concordance, spanning target trial components (eligibility, exposure/new-user ascertainment, baseline covariates, outcomes, and follow-up). Data quality was compared across EHR-only, claims-only, and EHR-claims linked data. ResultsThe mean annual EHR-Medicare linked cohort included 12,895 patients (mean age 74.9 years; 58.0% female). Demographics were complete and highly accurate. In the EHR-only cohort, completeness ranged 34.1-78.4% for conditions and 53.7-63.4% for glucose lowering drugs (GLDs). Accuracy was high for prevalent conditions and GLD use but low for incident measures. Plausible values were common (>98.5%), and HbA1c - T2D concordance was strong (98.6%). Linking EHR and claims substantially improved completeness and accuracy, especially for encounters, mortality, incident diagnoses, and medications. DiscussionThe linked dataset addressed major limitations of EHR-only data and provided enhanced granularity compared to claims alone, offering a comprehensive resource for real-world target trial emulation research. ConclusionEHRs offer valuable clinical details but face data quality challenges. Robust quality assurance strategies and linkage with external data are essential to strengthen real-world evidence and support target trial emulation. Lay SummaryWe evaluated whether a University of Florida Health electronic health record (EHR) cohort--alone and when linked to Medicare claims--has sufficient data quality to support "target trial emulation," a common approach for using real-world data to study medication effects when randomized trials are not feasible. We studied adults aged 65 years and older with type 2 diabetes from 2013-2020 and assessed four practical dimensions of data quality: completeness (how often key information is captured), accuracy (agreement with Medicare for billing-derived elements), plausibility (whether recorded values are clinically reasonable), and concordance (internal consistency between related EHR elements). Demographic fields were highly complete and accurate, and most lab and vital sign values were biologically plausible, supporting the reliability of core EHR clinical measurements. However, the EHR alone missed a substantial share of encounters, deaths, incident diagnoses, and medication initiation events that appeared in Medicare, reflecting care received outside a single health system. Linking EHR with Medicare substantially improved capture of these cross-setting events while preserving EHR-only clinical details (e.g., HbA1c and BMI), yielding a more robust dataset for real-world target trial emulation research.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.