Multi-stage reweighting to correct for participation bias in a nationwide biobank with nested recruitment
Traeholt, J.; Didriksen, M.; Helenius, D.; Christoffersen, L. A. N.; Dinh, K. M.; Dowsett, J.; Mikkelsen, C.; Hindhede, L.; Quinn, L. J. E.; Bruun, M. T.; Aagaard, B.; Hansen, T. F.; Hjalgrim, H.; Rostgaard, K.; Sorensen, E.; Erikstrup, C.; Pedersen, O. B. V.; Hansen, T.; Schork, A. J.; Markussen, B.; Ostrowski, S. R.
Show abstract
Selective participation in biobanks often compromises inference to the general population, particularly when selection occurs across multiple stages, whether at recruitment or during subsequent participation. Inverse probability (IP) weighting can reduce systematic differences using suitable external benchmarks, but most applications assume a single selection process. Here, we present a multi-stage IP-weighting framework and apply it to the Danish Blood Donor Study (DBDS), a nationwide biobank embedded in Denmark's blood-donation infrastructure. Using national registers, we estimated year-specific probabilities of (i) donation activity and (ii) DBDS enrolment conditional on donation activity, yielding two-stage inclusion weights for 169,893 participants. These weights reduced inclusion-associated imbalance across the 52 auxiliary variables in the probability models by 97.6% (median) and, despite strong health selection under donation-based recruitment, reduced relative-prevalence discrepancies across held-out prescription phenotypes by 69.7% (median). The effective sample size after weighting was 30,627 (18.0% of 169,893). Combining the inclusion weights with questionnaire-specific response weights across five DBDS questionnaires (>500 questions) produced the largest changes from unweighted to weighted responses for health behaviours and symptom severity, including tobacco and alcohol consumption, menstrual-pain severity, restless-legs severity, nocturia, sleep disturbance, and fatigue. These findings support multi-stage IP-weighting to improve population alignment in biobanks with staged selection.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.