Back

High-Throughput Observational Evidence Generation Using Linked Electronic Health Record and Claims Data

Gombar, S.; Shah, N.; Sanghavi, N.; Coyle, J.; Mukerji, A.; Chappelka, M.

2026-04-07 health informatics
10.64898/2026.04.07.26350300 medRxiv
Show abstract

Background: The observational literature on comparative effectiveness is expanding rapidly but remains difficult to synthesize. Discordant findings often stem from structural differences in cohort definitions, inclusion criteria, and follow up windows, leaving stakeholders without a cohesive evidence base. Furthermore, studies typically focus on a narrow subset of outcomes, neglecting the broader needs of diverse healthcare stakeholders 1,2,3,4. Methods We developed a high throughput evidence generation workflow using linked EHR and administrative claims data. The cornerstone is a prespecified measurement architecture applied uniformly across clinical scenarios: six post index windows (acute to two year follow.up); 28 Elixhauser comorbidities; 14 healthcare resource utilization (HCRU) categories; 29 laboratory measures with 52 binary thresholds; and 42 adverse event categories. We generated unadjusted treatment comparisons across ~1,038 outcomes per scenario, including effect-measure modification (EMM) assessments across 130 baseline features. Results Across 40 clinical domains, the workflow produced approximately 32,982,552 outcome evaluations. An evaluation included a treatment comparison outcome population effect estimate with uncertainty bounds and supporting diagnostics. Approximately 5,000 narrative summaries underwent structured clinical and statistical quality control before dissemination. Conclusions Standardized, high throughput workflows can shift evidence generation away from fragmented studies toward comprehensive evidence packages. This shared evidence base supports precision medicine by making treatment effect heterogeneity visible across clinically meaningful subpopulations, reducing the need for redundant, stakeholder-specific studies.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 11%
14.3%
2
JAMA
17 papers in training set
Top 0.1%
12.3%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
10.1%
4
The Lancet Digital Health
25 papers in training set
Top 0.1%
6.3%
5
BMJ
49 papers in training set
Top 0.2%
4.8%
6
npj Digital Medicine
97 papers in training set
Top 0.9%
4.8%
50% of probability mass above
7
Annals of Internal Medicine
27 papers in training set
Top 0.1%
4.0%
8
Med
38 papers in training set
Top 0.1%
2.4%
9
JAMA Network Open
127 papers in training set
Top 1%
2.4%
10
Clinical and Translational Science
21 papers in training set
Top 0.3%
2.1%
11
Journal of Clinical Epidemiology
28 papers in training set
Top 0.2%
1.9%
12
European Respiratory Journal
54 papers in training set
Top 0.9%
1.7%
13
Scientific Reports
3102 papers in training set
Top 60%
1.7%
14
Journal of the American College of Cardiology
12 papers in training set
Top 0.4%
1.3%
15
PLOS ONE
4510 papers in training set
Top 60%
1.2%
16
Cell Reports Medicine
140 papers in training set
Top 6%
0.9%
17
Scientific Data
174 papers in training set
Top 2%
0.9%
18
BMC Medicine
163 papers in training set
Top 6%
0.9%
19
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.9%
20
JAMIA Open
37 papers in training set
Top 1%
0.9%
21
Science Translational Medicine
111 papers in training set
Top 5%
0.9%
22
Trials
25 papers in training set
Top 1%
0.9%
23
eClinicalMedicine
55 papers in training set
Top 2%
0.8%
24
The American Journal of Human Genetics
206 papers in training set
Top 3%
0.8%
25
Nature Medicine
117 papers in training set
Top 4%
0.8%
26
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.8%
27
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.7%
28
Nature Cancer
35 papers in training set
Top 1%
0.7%
29
npj Precision Oncology
48 papers in training set
Top 1%
0.7%
30
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.9%
0.7%