Back

Cross-Tabulating Epidemiological Covariates with AUDIT-C Data in Large-Scale Biobanks

Blackburn, A.

2026-04-03 epidemiology
10.64898/2026.04.01.26349975 medRxiv
Show abstract

Introduction: The Alcohol Use Disorders Identification Test-Consumption (AUDIT-C) is a widely utilized screening tool in large-scale electronic health record (EHR) biobanks. However, its categorical, range-based survey responses present a significant challenge for epidemiological research, especially where continuous quantitative variables may be preferred. Standard workarounds, such as assigning categorical midpoints or utilizing aggregate ordinal scores for regression mapping often introduce false mathematical precision or obscure critical behavioral nuances between drinking frequency and quantity. This report presents a novel framework for presenting and bounding categorical alcohol survey data. Materials and Methods: I developed two complementary descriptive techniques: (1) a two-dimensional cross-tabulation matrix that preserves the interaction between drinking frequency and typical quantity, and (2) a systematic bounding algorithm that applies time-interval correction factors to calculate strict lower and upper estimates of average daily alcohol consumption. To demonstrate the real-world utility of this framework, I applied these methods to three analytical descriptive scenarios within a European ancestry (EUR) cohort of the All of Us Research Program: Generalized Anxiety Disorder (GAD) prevalence (n=104,893), minor allele frequency (MAF) for the rs1229984 genetic variant (n=104,890), and self-reported active duty military service history (n=104,893). Results: Application of the cross-tabulation matrix revealed patterns across all three descriptive scenarios. For example, participants reporting the highest frequency ("4 or more times a week") combined with the highest quantity ("10 or More" drinks) demonstrated a GAD prevalence of 13.5%, compared to 5.8% among those reporting the same frequency but a low quantity ("1 or 2" drinks). A general trend of increased anxiety in higher quantity drinkers contrasts with a general trend of decreased anxiety in higher frequency drinkers. Bounding estimates for average daily consumption ranged from 0.299 to 0.730 drinks for individuals with GAD, and 0.303 to 0.787 for those without. Those who reported having been active duty in the US Armed Forces demonstrated a general trend toward more frequent drinking and higher average daily consumption estimates (0.339 to 0.875) than those who had not (0.297 to 0.770). The minor allele of the genetic variant rs1229984 exhibited a clear effect reducing both frequency and quantity, resulting in lower average daily consumption estimates. Conclusions: This bounding and mapping framework provides researchers with an additional method to traditional midpoint and aggregate scoring methods. By explicitly defining the uncertainty inherent in categorical survey instruments and visualizing cohort distributions across intersecting behavioral axes, this methodology improves the resolution, reproducibility, and interpretability of lifestyle exposure data.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Addiction
25 papers in training set
Top 0.1%
17.7%
2
PLOS ONE
4510 papers in training set
Top 19%
10.2%
3
American Journal of Epidemiology
57 papers in training set
Top 0.1%
8.5%
4
Epidemiology
26 papers in training set
Top 0.1%
6.9%
5
Alcohol
15 papers in training set
Top 0.1%
6.9%
50% of probability mass above
6
Drug and Alcohol Dependence
37 papers in training set
Top 0.2%
6.9%
7
Scientific Reports
3102 papers in training set
Top 27%
4.3%
8
Journal of Affective Disorders
81 papers in training set
Top 0.8%
2.1%
9
BMC Public Health
147 papers in training set
Top 3%
1.8%
10
Alcoholism: Clinical and Experimental Research
13 papers in training set
Top 0.2%
1.8%
11
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.3%
1.8%
12
JAMA Network Open
127 papers in training set
Top 2%
1.7%
13
BMC Medicine
163 papers in training set
Top 5%
1.2%
14
International Journal of Environmental Research and Public Health
124 papers in training set
Top 5%
1.1%
15
International Journal of Drug Policy
11 papers in training set
Top 0.3%
0.9%
16
International Journal of Epidemiology
74 papers in training set
Top 2%
0.9%
17
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.8%
18
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.8%
19
PeerJ
261 papers in training set
Top 14%
0.8%
20
Scientific Data
174 papers in training set
Top 2%
0.8%
21
JMIRx Med
31 papers in training set
Top 2%
0.8%
22
JMIR mHealth and uHealth
10 papers in training set
Top 0.4%
0.8%
23
Alcohol, Clinical and Experimental Research
12 papers in training set
Top 0.3%
0.8%
24
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
25
Journal of Psychiatric Research
28 papers in training set
Top 0.9%
0.6%
26
JAMA Psychiatry
13 papers in training set
Top 0.8%
0.5%