Back

Controlling for confounds in UK Biobank brain imaging data with small subsets of subjects

Radosavljevic, L.; Smith, S.; Nichols, T. E.

2026-03-03 epidemiology
10.64898/2026.03.02.26347455 medRxiv
Show abstract

The UK Biobank (UKB) Brain Imaging cohort contains data from almost 100,000 subjects and has yielded invaluable understanding of the links between the brain and health outcomes and lifestyles. Much of the understanding of these links has come from exploring the association between Imaging Derived Phenotypes (IDPs) and other variables that are unrelated to brain imaging, so called non-Imaging Derived Phenotypes (nIDPs). When performing analysis of this kind, it is very important to control for well known confounding factors such as age, sex and socio-economic status, as well as confounds which are related to the imaging protocol itself. In previous work, we created a pipeline for constructing imaging confounds for use in statistical inference via a standard multivariate linear regression approach (Alfaro-Almagro et. al. 2021). However, this approach is problematic when the number of confounds exceeds the number of subjects, and is severely underpowered when the number of number of subjects is not much larger than the number of confounds. In this work, we perform a simulation study to evaluate 13 modelling approaches to account for confounds when their number is similar to or exceeds the number of subjects. Based on the simulation results, we recommend a ridge regression based permutation test for low sample sizes (n [&le;] 50), a version of de-sparsified LASSO for intermediate sample sizes (50 < n [&le;] 500), and multivariate linear regression aided by Principal Component Analysis (PCA) for larger sample sizes (n > 500). We also demonstrate the use of our recommended methodology on a real data example of finding associations between Alzheimers Disease (AD) and IDPs.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
International Journal of Epidemiology
74 papers in training set
Top 0.2%
8.6%
2
Human Brain Mapping
295 papers in training set
Top 0.8%
7.3%
3
Aperture Neuro
18 papers in training set
Top 0.1%
7.3%
4
Genetic Epidemiology
46 papers in training set
Top 0.1%
7.0%
5
European Journal of Epidemiology
40 papers in training set
Top 0.1%
6.5%
6
NeuroImage: Clinical
132 papers in training set
Top 0.8%
4.9%
7
Scientific Reports
3102 papers in training set
Top 27%
4.4%
8
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
9
Frontiers in Aging Neuroscience
67 papers in training set
Top 1%
3.1%
50% of probability mass above
10
Brain Communications
147 papers in training set
Top 0.9%
2.9%
11
Journal of Alzheimer's Disease
43 papers in training set
Top 0.6%
2.4%
12
Biology Methods and Protocols
53 papers in training set
Top 0.6%
2.1%
13
PLOS ONE
4510 papers in training set
Top 50%
1.9%
14
GeroScience
97 papers in training set
Top 0.9%
1.8%
15
Alzheimer's Research & Therapy
52 papers in training set
Top 1%
1.7%
16
Neurobiology of Aging
95 papers in training set
Top 1%
1.7%
17
Scientific Data
174 papers in training set
Top 1%
1.7%
18
eLife
5422 papers in training set
Top 41%
1.7%
19
NeuroImage
813 papers in training set
Top 4%
1.7%
20
Frontiers in Genetics
197 papers in training set
Top 6%
1.4%
21
Bioinformatics
1061 papers in training set
Top 8%
1.2%
22
Journal of Medical Imaging
11 papers in training set
Top 0.2%
1.2%
23
American Journal of Epidemiology
57 papers in training set
Top 1.0%
1.2%
24
F1000Research
79 papers in training set
Top 3%
1.0%
25
BMC Medical Research Methodology
43 papers in training set
Top 1.0%
1.0%
26
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
27
Wellcome Open Research
57 papers in training set
Top 3%
0.7%
28
Imaging Neuroscience
242 papers in training set
Top 4%
0.7%
29
Alzheimer's & Dementia
143 papers in training set
Top 3%
0.7%
30
Frontiers in Psychiatry
83 papers in training set
Top 4%
0.7%