Back

Replicable multivariate BWAS with moderate sample sizes

Spisak, T.; Bingel, U.; Wager, T. D.

2022-06-26 neuroscience
10.1101/2022.06.22.497072 bioRxiv
Show abstract

Brain-Wide Association Studies (BWAS) have become a dominant method for linking mind and brain over the past 30 years. Univariate models test tens to hundreds of thousands of brain voxels individually, whereas multivariate models ( multivariate BWAS) integrate signals across brain regions into a predictive model. Numerous problems have been raised with univariate BWAS, including lack of power and reliability and an inability to account for pattern-level information embedded in distributed neural circuits1-3. Multivariate predictive models address many of these concerns, and offer substantial promise for delivering brain-based measures of behavioral and clinical states and traits2,3. In their recent paper4, Marek et al. evaluated the effects of sample size on univariate and multivariate BWAS in three large-scale neuroimaging dataset and came to the general conclusion that "BWAS reproducibility requires samples with thousands of individuals". We applaud their comprehensive analysis, and we agree that (a) large samples are needed when conducting univariate BWAS of individual differences in trait measures, and (b) multivariate BWAS reveal substantially larger effects and are therefore more highly powered. However, we disagree with Marek et al.s claims that multivariate BWAS provide "inflated in-sample associations" that often fail to replicate (i.e., are underpowered), and that multivariate BWAS consequently require thousands of participants when predicting trait-level individual differences. Here we substantiate that (i) with appropriate methodology, the reported in-sample effect size inflation in multivariate BWAS can be entirely eliminated, and (ii) in most cases, multivariate BWAS effects are replicable with substantially smaller sample sizes (Figure 1). O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=198 SRC="FIGDIR/small/497072v1_fig1.gif" ALT="Figure 1"> View larger version (44K): org.highwire.dtl.DTLVardef@180764borg.highwire.dtl.DTLVardef@d64a2forg.highwire.dtl.DTLVardef@a0865aorg.highwire.dtl.DTLVardef@d4b14c_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 1.C_FLOATNO Multivariate BWAS provide unbiased effect sizes and high replicability with low-moderate sample sizes. (a) In-sample effects in multivariate BWAS are only inflated if estimates are obtained without cross-validation. (b) Cross-validation fully eliminates in-sample effect size inflation and, as a consequence, provides higher replicability. Each point in (a) and (b) corresponds to one bootstrap subsample, as in Fig. 4b of Marek et al. Dotted lines denote the threshold for p=0.05 with n=495. (c) The inflation of in-sample effect size obtained without cross-validation (red) is reduced, but does not disappear, at higher sample sizes. Conversely, cross-validated estimates (blue) are slightly pessimistic with low sample sizes and become quickly unbiased as sample size is increased. (d) Without cross-validation, in-sample effect size estimates are non-zero (r{approx}0.5, red) even when predicting permuted outcome data. Cross-validation eliminates systematic bias across all sample sizes (blue). Dashed lines in (c) and (d) denote 95% parametric confidence intervals, and shaded areas denote bootstrap and permutation-based confidence intervals. (e-f) Cross-validated analysis reveals that sufficient in-sample power (e) and out-of-sample replication probability (P(rep)) (f) can be achieved for a variety of phenotypes at low or moderate sample sizes. 80% power and P(rep) are achievable in <500 participants for half the phenotypes tested (colored bars) using the prediction algorithm in Marek et al. (top panels in (e) and (f), sample size required for 80% power or P(rep) shown). Other phenotypes require sample sizes >500 (bars with arrows). Power and P(rep) can be substantially improved with a ridge regression-based model recommended in some comparison studies10,11 (bottom panels in (e) and (f)), with 80% power and P(rep) with sample sizes as low as n=100 and n=75, respectively, when predicting cognitive ability, and sample sizes between 75 and 375 for other investigated variables, except inhibition assessed with the flanker task. (g) We estimated interactions between sample size and publication bias by computing effect size inflation (rdiscovery - rreplication) only for those bootstrap cases where prediction performance was significant (p>0.05) in the replication sample. Our analysis shows that the effect size inflation due to publication bias is modest (<10%) with <500 participants for half the phenotypes using the Marek et al. model and all phenotypes but the flanker using the ridge model. C_FIG

Matching journals

The top 3 journals account for 50% of the predicted probability mass.