Back

Bias from small-count suppression in county-level cancer disparity estimates: a calibrated simulation study

gahan, k.

2026-06-08 epidemiology
10.64898/2026.06.05.26355021 medRxiv
Show abstract

Abstract Background. Area-level cancer disparities are routinely estimated from public county data in which rates based on small counts (fewer than 16 cases or deaths) are suppressed. Analysts typically drop suppressed counties (complete-case analysis). Because suppression depends on case counts tied to population size and demographic composition, this missingness may be informative, but its effect on the disparity estimate has not, to our knowledge, been quantified. Methods. In a cross-sectional ecological study of 3,143 U.S. counties (analytic sample 3,018 with computable exposure) using one frozen public release of NCI State Cancer Profiles incidence and mortality data and ACS 2018-2022 5-year data, we estimated the most- versus least-deprived ICE(race+income) quintile rate ratio (RR) and rate difference for female breast, stomach, and cervix cancers under four suppression-handling methods: complete-case, available-case, bounding, and model-based small-area estimation. We characterized which counties were erased, and, following the ADEMP framework, ran a Monte Carlo simulation (1,000 replicates per cell; Monte Carlo standard error of bias approximately 0.0025) calibrated to the release to measure bias against a known truth. Analyses were pre-registered. Results. The suppressed fraction rose with rarity: 7.4% of counties for breast, 61.3% for stomach, and 75.7% for cervix incidence. Suppression was concentrated in the most-deprived quintile (cervix, 81.8% suppressed vs 63.8% least-deprived) and overwhelmingly removed rural rather than minority residents (cervix: 81% of the rural but 9% of the minority population erased). For breast (little suppression) the RR was 0.87 (95% CI 0.85-0.89) and identical across methods; for cervix incidence the complete-case RR (1.56) exceeded the model-based estimate (1.50), and for cervix mortality (91% suppressed) complete-case (1.86) exceeded model-based (1.56) by 16% with a wide bounding interval (1.88-2.62). In calibrated simulation, population-weighted complete-case bias was small (less than 2%) at the observed deprivation-county-size correlation and grew with rarity, threshold, and unweighted aggregation; its direction was conditional, becoming positive (over-estimation) as deprived counties became smaller. Conclusions. Complete-case handling of suppressed counties over-estimates rare-cancer area disparities relative to methods that retain them, while silently erasing most of the rural and most-deprived communities the estimate is meant to represent. The effect is negligible for common cancers and grows with rarity. Public-data disparity analyses should report the suppressed fraction and use bounded or model-based estimates by default. Keywords: cancer disparities; small-count suppression; Index of Concentration at the Extremes; informative missingness; small-area estimation; rural health.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
American Journal of Epidemiology
57 papers in training set
Top 0.1%
38.0%
2
International Journal of Epidemiology
74 papers in training set
Top 0.1%
10.1%
3
International Journal of Cancer
42 papers in training set
Top 0.2%
4.9%
50% of probability mass above
4
eLife
5422 papers in training set
Top 22%
4.0%
5
PLOS Medicine
98 papers in training set
Top 1%
3.6%
6
Epidemiology
26 papers in training set
Top 0.1%
3.6%
7
BMC Medicine
163 papers in training set
Top 2%
3.3%
8
BMC Medical Research Methodology
43 papers in training set
Top 0.4%
2.4%
9
JAMA Network Open
127 papers in training set
Top 2%
1.9%
10
PLOS ONE
4510 papers in training set
Top 51%
1.8%
11
Nature Communications
4913 papers in training set
Top 52%
1.7%
12
The Lancet Public Health
20 papers in training set
Top 0.3%
1.5%
13
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.3%
14
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.4%
1.2%
15
Clinical Infectious Diseases
231 papers in training set
Top 3%
1.2%
16
SSM - Population Health
17 papers in training set
Top 0.3%
1.0%
17
Scientific Reports
3102 papers in training set
Top 68%
1.0%
18
Annals of Epidemiology
19 papers in training set
Top 0.3%
1.0%
19
The Lancet Global Health
24 papers in training set
Top 0.9%
0.9%
20
BMC Public Health
147 papers in training set
Top 5%
0.8%
21
Nature Medicine
117 papers in training set
Top 4%
0.8%
22
BMJ Open
554 papers in training set
Top 13%
0.7%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 46%
0.7%
24
Nature Human Behaviour
85 papers in training set
Top 5%
0.7%
25
Journal of Epidemiology and Community Health
32 papers in training set
Top 0.7%
0.7%
26
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
27
Science Advances
1098 papers in training set
Top 31%
0.7%
28
Medical Decision Making
10 papers in training set
Top 0.4%
0.6%
29
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%
30
Eurosurveillance
80 papers in training set
Top 2%
0.5%