Preventing Data Leakage in Neural Decoding
Wong, R.; Zhu, S. I.; McCullough, M. H.; Goodhill, G. J.
Show abstract
Neural decoding is a widely-used machine learning technique for investigating how behavior, perception and cognition are represented in neural activity. However without careful application data leakage can occur, where information from the test set contaminates the training set, leading to biased estimates of decoding performance and potentially invalidating biological conclusions. Here we use simulated and biological datasets to demonstrate how both supervised and unsupervised data preprocessing, including dimensionality reduction, can introduce leakage in neural decoding studies. We reveal that in some cases leakage can paradoxically decrease decoding performance relative to unbiased estimates, and we provide theoretical analyses explaining how this occurs. We demonstrate that, for autocorrelated neural time series, standard k-fold cross-validation can dramatically overstate performance. Finally we provide detailed recommendations for avoiding data leakage in neural decoding.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.