Tackling Bias in Cortical Thickness Estimation in UK Biobank Using Harmonisation Approaches

Turnbull, J.; Bhalerao, G.; Dawson, R.; Lange, F.; Alfaro-Almagro, F.; Smith, S.; Griffanti, L.

2026-05-26 neuroscience

10.64898/2026.05.22.726536 bioRxiv

Show abstract

Big neuroimaging data enable researchers to study subtle structural and functional brain changes and relationships between brain characteristics and genetics, lifestyle, and disease factors. However, substantial effort is needed to minimise technical, non-biological differences between data batches to avoid incorrect inferences. In this study, we address a previously identified bias in UK Biobank FreeSurfer IDPs derived from only the T1 image compared to those using both T1 and T2-FLAIR by treating the bias as a batch effect and using harmonisation approaches. We investigate and characterise this bias through direct within-participant comparison at the image and IDP level, comparing the results with those seen in the wider UKB sample. We then assess different methods of addressing the effect of missing T2-FLAIR, starting from simple linear regression before moving to ComBat, a widely used harmonisation method, testing different approaches for applying ComBat and showing its similarity to simple linear regression. Finally, we examine how ComBat estimates vary with batch and sample size. Our results show clear benefits in using both T1 and T2-FLAIR data in FreeSurfer, as opposed to just the T1, which is more common, with the pial surface fitting being less likely to fail and showing greater biologically plausible inter-subject variability. This is particularly important for cortical thickness IDPs, where T2-FLAIR omission leads to reduced true variability and systematic underestimation, as shown through within-participant repeat testing. We demonstrate that ComBat can address this bias, with its standard use (i.e., applied separately on different IDP categories) showing the best improvement in cortical thickness measures where the bias is strongest, and we find that it is important not to pool ComBat priors across different classes of IDPs. Our proposed version of ComBat with a reference batch (i.e., estimating mean and variance only from data with T2-FLAIR available) performed best in recovering both mean and variance differences between batches across different IDP classes and offers a promising approach for cases where a reference batch is clearly identifiable. While ComBat reliably corrects mean (additive) batch effects with relatively small sample sizes ({approx}30 subjects per batch), we show that its variance (multiplicative) correction is substantially less stable, requiring much larger sample sizes and becoming unreliable when batches are small or imbalanced, or when there is a large variance difference between them.