Back

Handling onset age inconsistencies in longitudinal healthcare survey data

Li, W.; Yuan, M.; Park, Y.; Dao Duc, K.

2026-02-23 health informatics
10.64898/2026.02.20.26346741 medRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWLongitudinal healthcare surveys frequently contain inconsistencies in self-reported onset ages, where participants report different ages for the same condition between enrollment and follow-up surveys. We propose two methods to handle this challenge. First, we introduce a procedure that aggregates inconsistency patterns to construct participant-level reliability scores, enabling researchers to stratify participants and prioritize analysis on high-reliability cohorts. Second, we present a Bayesian adjustment method that models enrollment and follow-up reports as noisy observations of a latent true onset age, producing adjusted estimates for the inconsistent observations that account for age-dependent and inter-survey-time effects. We evaluate both methods using data from the Canadian Partnership for Tomorrows Health. In general, both methods substantially strengthen correlations between biologically related conditions and improve predictive performance across classification and regression tasks. In addition, high-reliability cohorts from reliability score-based stratification reveal more coherent and interpretable disease clustering networks, and Bayesian adjustment shows particularly notable gains when multiple inconsistent variables are adjusted simultaneously. Finally, we provide guidance on choosing between these methods for healthcare practitioners. Institutional Review Board (IRB)The study is approved by the University of British Columbia IRB (IRB #H23-03800). Data and Code AvailabilityCanPath data are available to researchers through a controlled access process via the CanPath Access Portal (https://portal.canpath.ca). The code is available at https://anonymous.4open.science/r/canpath-FCCF.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 10%
14.7%
2
npj Digital Medicine
97 papers in training set
Top 0.6%
9.1%
3
Nature Genetics
240 papers in training set
Top 1%
6.4%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.4%
5
Patterns
70 papers in training set
Top 0.1%
6.3%
6
Science Translational Medicine
111 papers in training set
Top 0.8%
3.7%
7
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.6%
50% of probability mass above
8
Nature Computational Science
50 papers in training set
Top 0.1%
3.6%
9
Science Advances
1098 papers in training set
Top 8%
3.2%
10
GENETICS
189 papers in training set
Top 0.3%
3.1%
11
Bioinformatics
1061 papers in training set
Top 6%
3.1%
12
Scientific Reports
3102 papers in training set
Top 50%
2.1%
13
The Lancet Digital Health
25 papers in training set
Top 0.3%
1.9%
14
PLOS Computational Biology
1633 papers in training set
Top 15%
1.9%
15
Communications Medicine
85 papers in training set
Top 0.2%
1.8%
16
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
17
Nature Medicine
117 papers in training set
Top 2%
1.7%
18
JAMIA Open
37 papers in training set
Top 1%
1.3%
19
PLOS Genetics
756 papers in training set
Top 10%
1.3%
20
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.3%
21
European Journal of Epidemiology
40 papers in training set
Top 0.5%
1.2%
22
PNAS Nexus
147 papers in training set
Top 0.8%
1.1%
23
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
24
Genome Medicine
154 papers in training set
Top 7%
0.8%
25
Med
38 papers in training set
Top 0.8%
0.7%
26
Cell Systems
167 papers in training set
Top 12%
0.7%
27
eLife
5422 papers in training set
Top 58%
0.7%
28
Genome Biology
555 papers in training set
Top 9%
0.6%
29
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.6%
30
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.6%