Back

Unified Multi-Cohort Harmonisation and Normative Modelling of Neuroimaging Data via Hierarchical GAMLSS

Ho, M. P.; Husein, N. K.; Fan, L.; Visontay, R.; Byrne, H.; Devine, E. K.; Squeglia, L. M.; Sachdev, P. S.; Jiang, J.; Wen, W.; Mewton, L.

2026-03-11 neuroscience
10.64898/2026.03.08.710422 bioRxiv
Show abstract

Large-scale neuroimaging studies increasingly pool data across multiple cohorts, scanners, and acquisition protocols, introducing technical between-cohort variation that must be addressed before meaningful biological inference can be drawn. Existing harmonisation methods, particularly ComBat-based approaches, have been widely adopted for this purpose. However, they remain limited by Gaussian assumptions and by their focus on location or location-scale correction. In this study, we propose a unified hierarchical Generalised Additive Models for Location, Scale and Shape (GAMLSS) framework for multi-cohort harmonisation and normative modelling of structural neuroimaging data. The framework models cohort effects directly within all fitted distributional parameters, accommodates any parametric family for which exact inverse mapping is available, and returns harmonised values on the original measurement scale through centile-based quantile mapping. Normative deviation scores are obtained as a direct by-product of the same fitted model, enabling harmonisation and normative inference to be conducted jointly. The method was evaluated in a pooled longitudinal dataset comprising 88,126 observations across 237 structural neuroimaging features from six cohorts spanning childhood to late life: ABCD, IMAGEN, NCANDA, LIFE, UK Biobank, and MAS. Harmonisation performance was compared with ComBat, ComBat-GAM, and ComBat-LS using complementary criteria assessing data retention, residual batch effects, preservation of age-related and sex-related biological signal, and coherence of post-harmonisation lifespan trajectories. GAMLSS achieved near-complete removal of residual cohort effects, retained almost all valid observations post-harmonisation, and showed the strongest overall preservation of biological signal across validation metrics. In particular, it better preserved biologically plausible age trajectories for distributionally complex features such as white matter hypointensity volume, while simultaneously providing harmonised native-scale values and normative deviation scores within a single framework. These findings suggest that hierarchical GAMLSS offers a flexible and practical alternative to existing ComBat-based methods for large-scale neuroimaging harmonisation, particularly for features with non-Gaussian residual distributions and settings where cohort effects extend beyond differences in mean and variance.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.