Handling onset age inconsistencies in longitudinal healthcare survey data
Li, W.; Yuan, M.; Park, Y.; Dao Duc, K.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWLongitudinal healthcare surveys frequently contain inconsistencies in self-reported onset ages, where participants report different ages for the same condition between enrollment and follow-up surveys. We propose two methods to handle this challenge. First, we introduce a procedure that aggregates inconsistency patterns to construct participant-level reliability scores, enabling researchers to stratify participants and prioritize analysis on high-reliability cohorts. Second, we present a Bayesian adjustment method that models enrollment and follow-up reports as noisy observations of a latent true onset age, producing adjusted estimates for the inconsistent observations that account for age-dependent and inter-survey-time effects. We evaluate both methods using data from the Canadian Partnership for Tomorrows Health. In general, both methods substantially strengthen correlations between biologically related conditions and improve predictive performance across classification and regression tasks. In addition, high-reliability cohorts from reliability score-based stratification reveal more coherent and interpretable disease clustering networks, and Bayesian adjustment shows particularly notable gains when multiple inconsistent variables are adjusted simultaneously. Finally, we provide guidance on choosing between these methods for healthcare practitioners. Institutional Review Board (IRB)The study is approved by the University of British Columbia IRB (IRB #H23-03800). Data and Code AvailabilityCanPath data are available to researchers through a controlled access process via the CanPath Access Portal (https://portal.canpath.ca). The code is available at https://anonymous.4open.science/r/canpath-FCCF.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.