Back

Benchmarking commercial healthcare claims data

Dahlen, A.; Deng, Y.; Charu, V.

2024-08-20 epidemiology
10.1101/2024.08.19.24312249 medRxiv
Show abstract

ImportanceCommercial healthcare claims datasets represent a sample of the US population that is biased along socioeconomic/demographic lines; depending on the target population of interest, results derived from these datasets may not generalize. Rigorous comparisons of claims-derived results to ground-truth data that quantify this bias are lacking. Objectives(1) To quantify the extent and variation of the bias associated with commercial healthcare claims data with respect to different target populations; (2) To evaluate how socioeconomic/demographic factors may explain the magnitude of the bias. DesignThis is a retrospective observational study. Healthcare claims data come from the Merative MarketScan(R) Commercial Database; reference data for comparison come from the State Inpatient Databases (SID) and the US Census. We considered three target populations, aged 18-64 years: (1) all Americans; (2) Americans with health insurance; (3) Americans with commercial health insurance. ParticipantsWe analyzed inpatient discharge records of patients aged 18-64 years, occurring between 01/01/2019 to 12/31/2019 in five states: California, Iowa, Maryland, Massachusetts, and New Jersey. OutcomesWe estimated rates of the 250 most common inpatient procedures, using claims data and using reference data for each target population, and we compared the two estimates. ResultsThe average rate of inpatient discharges per 100 person-years was 5.39 in the claims data (95% CI: [5.37, 5.40]) and 7.003 (95% CI: [7.002, 7.004]) in the reference data for all Americans, corresponding to a 23.1% underestimate from claims. We found large variation in the extent of relative bias across inpatient procedures, including 22.8% of procedures that were underestimated by more than a factor of 2. There was a significant relationship between socioeconomic/demographic factors and the magnitude of bias: procedures that disproportionately occur in disadvantaged neighborhoods were more underestimated in claims data (R2 = 51.6%, p < 0.001). When the target population was restricted to commercially insured Americans, the bias decreased substantially (3.2% of procedures were biased by more than factor of 2), but some variation across procedures remained. Conclusions and relevanceNaive use of healthcare claims data to derive estimates for the underlying US population can be severely biased. The extent of bias is at least partially explained by neighborhood-level socioeconomic factors.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.1%
17.4%
2
Journal of General Internal Medicine
20 papers in training set
Top 0.1%
9.1%
3
PLOS ONE
4510 papers in training set
Top 22%
8.3%
4
JAMA Network Open
127 papers in training set
Top 0.4%
6.3%
5
BMC Medical Research Methodology
43 papers in training set
Top 0.2%
4.8%
6
BMJ Open
554 papers in training set
Top 5%
3.8%
7
Epidemiology
26 papers in training set
Top 0.2%
2.6%
50% of probability mass above
8
American Journal of Epidemiology
57 papers in training set
Top 0.5%
2.1%
9
Scientific Reports
3102 papers in training set
Top 54%
1.9%
10
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.8%
11
BMC Health Services Research
42 papers in training set
Top 1%
1.7%
12
Annals of Epidemiology
19 papers in training set
Top 0.2%
1.7%
13
Preventive Medicine
11 papers in training set
Top 0.1%
1.5%
14
BMJ
49 papers in training set
Top 0.7%
1.3%
15
PLOS Medicine
98 papers in training set
Top 3%
1.2%
16
Journal of Clinical Epidemiology
28 papers in training set
Top 0.4%
1.2%
17
The Lancet Public Health
20 papers in training set
Top 0.4%
1.1%
18
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.9%
19
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
20
npj Digital Medicine
97 papers in training set
Top 3%
0.9%
21
JMIRx Med
31 papers in training set
Top 1%
0.9%
22
Heart
10 papers in training set
Top 0.8%
0.9%
23
Preventive Medicine Reports
14 papers in training set
Top 0.4%
0.9%
24
SSM - Population Health
17 papers in training set
Top 0.4%
0.8%
25
Journal of Epidemiology and Community Health
32 papers in training set
Top 0.6%
0.8%
26
American Journal of Preventive Medicine
11 papers in training set
Top 0.5%
0.8%
27
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
28
Journal of Racial and Ethnic Health Disparities
11 papers in training set
Top 0.5%
0.7%
29
Journal of Public Health
23 papers in training set
Top 1%
0.7%
30
Infection Control & Hospital Epidemiology
17 papers in training set
Top 0.6%
0.7%