Back

Quantifying the Optimism of Naive Cross-Validation for Binary Outcome Prediction with Repeated-Measures Predictors: A Simulation Study and Clinical Illustration

Hagan, J.

2026-05-29 epidemiology
10.64898/2026.05.27.26354222 medRxiv
Show abstract

Background. Cross-validation (CV) is widely used to estimate predictive performance, but can overestimate performance when applied at the observation level to repeated-measures data. When continuous predictor variables are measured repeatedly within subjects and the binary outcome is defined at the subject level, naive observation-level CV introduces data leakage through within-subject dependence, producing optimistically biased estimates of the area under the receiver operating characteristic curve (AUROC). The magnitude of this bias and the performance of alternative partitioning strategies have not been formally characterized for this data structure. Methods. Three CV strategies were compared for estimating subject-level AUROC in ridge logistic regression models: naive observation-level 10-fold CV, subject-level 10-fold CV, and leave-one-cluster-out (LOCO) CV. The framework was applied to a motivating clinical dataset of daily oxygenation measures and retinopathy of prematurity outcomes among 101 extremely low birth weight infants. A factorial simulation study was conducted across 162 parameter combinations varying cluster count (20-150), intraclass correlation (0.1-0.5), within-cluster autocorrelation (0.2-0.8), and outcome prevalence (10-35%), with 500 simulated datasets per condition (76,389 valid datasets total). Results. In the motivating dataset, naive CV produced optimism of +0.078 AUROC units for severe ROP prediction (15 events, 101 subjects) and +0.031 for any ROP prediction (48 events). Subject-level 10-fold CV closely approximated LOCO (deviation [≤] 0.015). In the simulation, naive CV optimism ranged from +0.039 to +0.204 across all conditions, increasing monotonically with higher ICC, higher autocorrelation, fewer clusters, and lower event rates. Subject-level 10-fold CV was essentially unbiased relative to LOCO across all 162 conditions (mean absolute deviation = 0.002). Conclusions. Naive observation-level CV meaningfully overestimates discriminative performance in the repeated-measures binary outcome setting and should not be used. Subject-level CV partitioning effectively eliminates this bias. Accordingly, subject-level partitioning should be considered essential, not optional, when validating prediction models using repeated-measures data with subject-level outcomes.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 17%
10.3%
2
Scientific Reports
3102 papers in training set
Top 9%
8.6%
3
Epidemiology
26 papers in training set
Top 0.1%
7.3%
4
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
7.0%
5
International Journal of Epidemiology
74 papers in training set
Top 0.4%
4.4%
6
American Journal of Epidemiology
57 papers in training set
Top 0.3%
3.7%
7
Pediatric Research
18 papers in training set
Top 0.1%
3.7%
8
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
9
Trials
25 papers in training set
Top 0.5%
2.8%
50% of probability mass above
10
Nature Communications
4913 papers in training set
Top 50%
1.8%
11
PLOS Global Public Health
293 papers in training set
Top 3%
1.8%
12
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
13
Frontiers in Medicine
113 papers in training set
Top 3%
1.7%
14
Journal of Clinical Medicine
91 papers in training set
Top 4%
1.5%
15
eBioMedicine
130 papers in training set
Top 2%
1.5%
16
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.3%
1.4%
17
JAMA Network Open
127 papers in training set
Top 3%
1.4%
18
eLife
5422 papers in training set
Top 46%
1.4%
19
Biology Methods and Protocols
53 papers in training set
Top 1%
1.3%
20
BMJ Open
554 papers in training set
Top 11%
1.1%
21
Diagnostics
48 papers in training set
Top 2%
0.9%
22
Statistics in Medicine
34 papers in training set
Top 0.3%
0.9%
23
BMC Pregnancy and Childbirth
20 papers in training set
Top 0.6%
0.9%
24
BMC Medicine
163 papers in training set
Top 6%
0.9%
25
BMJ Open Respiratory Research
32 papers in training set
Top 0.6%
0.8%
26
Frontiers in Neurology
91 papers in training set
Top 5%
0.8%
27
Developmental Cognitive Neuroscience
81 papers in training set
Top 0.6%
0.8%
28
npj Digital Medicine
97 papers in training set
Top 3%
0.8%
29
Frontiers in Pediatrics
29 papers in training set
Top 0.8%
0.8%
30
Pediatric Pulmonology
14 papers in training set
Top 0.4%
0.8%