Back

An Improved Pipeline for Constructing UK Biobank Brain Imaging Confounds

Radosavljevic, L.; Maullin-Sapey, T.; Alfaro-Almagro, F.; McCarthy, P.; Nichols, T. E.; Smith, S.

2025-11-22 epidemiology
10.1101/2025.11.21.25340740 medRxiv
Show abstract

UK Biobank (UKB) brain imaging data is a one-of-a-kind resource for studying the links between the brain and demographic-, lifestyle- and genetic data. When establishing such links, it is crucial to account for confounding effects caused by the acquisition of fMRI images, as well as demographic confounding factors. UKB brain imaging confounds are constructed through variable selection by the proportion of variance explained in the Imaging Derived Phenotypes (IDPs), from tens of thousands of possible confounds. The current implementation of this pipeline is very computationally intensive and has a large memory footprint, largely due to the varying patterns of missing data in IDPs. This makes it impractical for many users of UK Biobank brain imaging data. We propose a fast and memory efficient multivariate pipeline for constructing imaging confounds using mean imputation combined with a bias-corrected estimator of R2, the proportion of confound variance explained in an IDP. Building on this, we also improve the pipeline in order to better select confounds that explain unique variance in IDPs, and non-imaging variables of interest, so called nIDPs. The new implementation leads to a more compact set of confounds that explains roughly the same amount of variance, and runs in around 1 hour on a single CPU.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Aperture Neuro
18 papers in training set
Top 0.1%
18.8%
2
NeuroImage
813 papers in training set
Top 1.0%
12.4%
3
Human Brain Mapping
295 papers in training set
Top 0.6%
10.2%
4
Nature Communications
4913 papers in training set
Top 28%
6.4%
5
Medical Image Analysis
33 papers in training set
Top 0.2%
4.9%
50% of probability mass above
6
Scientific Data
174 papers in training set
Top 0.3%
4.9%
7
Scientific Reports
3102 papers in training set
Top 27%
4.3%
8
PLOS ONE
4510 papers in training set
Top 42%
3.1%
9
Journal of Medical Imaging
11 papers in training set
Top 0.1%
2.8%
10
Communications Biology
886 papers in training set
Top 4%
2.6%
11
International Journal of Epidemiology
74 papers in training set
Top 0.8%
2.6%
12
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
13
NeuroImage: Clinical
132 papers in training set
Top 2%
1.7%
14
eLife
5422 papers in training set
Top 42%
1.7%
15
Imaging Neuroscience
242 papers in training set
Top 2%
1.5%
16
Bioinformatics
1061 papers in training set
Top 8%
1.1%
17
Science Advances
1098 papers in training set
Top 25%
1.0%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
19
European Journal of Epidemiology
40 papers in training set
Top 0.6%
0.8%
20
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
21
Nature
575 papers in training set
Top 15%
0.8%
22
Genetic Epidemiology
46 papers in training set
Top 1.0%
0.6%
23
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%
24
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.6%
25
Nature Computational Science
50 papers in training set
Top 2%
0.5%