Back

Analytic Choices Shape Genomic Risk Estimates from Electronic Health Records: Coronary Heart Disease in eMERGE IV

Chen, J. H.; Knerr, S. A.; Veenstra, D. L.; Abul-Husn, N. S.; Hanks, S. C.; Kenny, E. E.; Limdi, N. A.; Cortopassi, J. B.; Crosslin, D.; Jarvik, G. P.; Kullo, I. J.

2026-04-30 epidemiology
10.64898/2026.04.28.26352002 medRxiv
Show abstract

BackgroundElectronic health records (EHR) are an important data source for genomic studies, but challenges exist in ascertaining cases and observation start time. We used data derived from the Electronic Medical Records and Genomics (eMERGE) IV study to examine how analytic assumptions regarding case ascertainment and EHR entry time influence estimation of monogenic and polygenic risks for coronary heart disease (CHD). MethodsWe assessed agreement between CHD cases ascertained from EHR phenotyping and survey. Associations of monogenic variants and high (top 5%) PRS of CHD were evaluated using multivariate relative risk (RR) regression under three alternative case definitions: EHR-algorithm-defined, self-reported, and combined. Time-to-event analyses (Kaplan-Meier method and Cox proportional hazards models) were conducted under three entry time specifications: (1) entry at the first EHR record, (2) entry at the start of the latest consecutive observation period, and (3) no left truncation. ResultsThe agreement between CHD cases ascertained by the EHR-based algorithm versus self-report was 37.2% among individuals identified as cases by at least one source, with the EHR algorithm demonstrating higher accuracy. The adjusted RR [95% confidence interval (CI)] associated with high PRS was 2.05 [1.50, 2.81] for EHR-defined, 1.49 [1.04, 2.13] for self-reported, and 1.66 [1.27, 2.18] for combined CHD. Estimated cumulative incidence by age 75 was 0.188 using the first EHR code as left truncation and 0.225 using the most recent observation period. Hazard ratio (HR) estimates were similar across the three left truncation scenarios. ConclusionThe choice of case definition meaningfully influenced RR estimates, whereas alternative specifications of EHR entry time affected absolute cumulative incidence estimates but has minimal impact on HR. These findings highlight the impact of analytical choices in EHR and survey-data-based studies that are applicable beyond the context of CHD.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 0.1%
19.4%
2
International Journal of Epidemiology
74 papers in training set
Top 0.1%
18.5%
3
Circulation
66 papers in training set
Top 0.6%
6.8%
4
American Journal of Epidemiology
57 papers in training set
Top 0.2%
4.2%
5
Journal of the American Heart Association
119 papers in training set
Top 2%
3.1%
50% of probability mass above
6
Genetic Epidemiology
46 papers in training set
Top 0.3%
2.9%
7
European Journal of Epidemiology
40 papers in training set
Top 0.2%
2.7%
8
PLOS Medicine
98 papers in training set
Top 2%
2.3%
9
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
1.9%
10
JAMA Network Open
127 papers in training set
Top 2%
1.9%
11
BMJ
49 papers in training set
Top 0.5%
1.8%
12
PLOS ONE
4510 papers in training set
Top 54%
1.7%
13
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
14
JAMA
17 papers in training set
Top 0.1%
1.5%
15
Nature Communications
4913 papers in training set
Top 56%
1.3%
16
BMC Medical Research Methodology
43 papers in training set
Top 0.8%
1.3%
17
BMC Medicine
163 papers in training set
Top 4%
1.3%
18
Database
51 papers in training set
Top 0.5%
1.3%
19
European Journal of Preventive Cardiology
13 papers in training set
Top 0.7%
1.2%
20
npj Genomic Medicine
33 papers in training set
Top 0.6%
1.1%
21
Scientific Reports
3102 papers in training set
Top 69%
0.9%
22
Arteriosclerosis, Thrombosis, and Vascular Biology
65 papers in training set
Top 2%
0.9%
23
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 2%
0.9%
24
Heart
10 papers in training set
Top 0.8%
0.8%
25
Epidemiology and Infection
84 papers in training set
Top 3%
0.8%
26
European Heart Journal
16 papers in training set
Top 0.8%
0.7%
27
Epidemiology
26 papers in training set
Top 0.5%
0.7%
28
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.5%
0.7%
29
Science
429 papers in training set
Top 22%
0.6%