Preventing Data Leakage in Neural Decoding

Wong, R.; Zhu, S. I.; McCullough, M. H.; Goodhill, G. J.

2026-01-27 neuroscience

10.64898/2026.01.26.701583 bioRxiv

Show abstract

Neural decoding is a widely-used machine learning technique for investigating how behavior, perception and cognition are represented in neural activity. However without careful application data leakage can occur, where information from the test set contaminates the training set, leading to biased estimates of decoding performance and potentially invalidating biological conclusions. Here we use simulated and biological datasets to demonstrate how both supervised and unsupervised data preprocessing, including dimensionality reduction, can introduce leakage in neural decoding studies. We reveal that in some cases leakage can paradoxically decrease decoding performance relative to unbiased estimates, and we provide theoretical analyses explaining how this occurs. We demonstrate that, for autocorrelated neural time series, standard k-fold cross-validation can dramatically overstate performance. Finally we provide detailed recommendations for avoiding data leakage in neural decoding.

Matching journals

●Non-profit ◐University press ○Commercial

The top 2 journals account for 50% of the predicted probability mass.

Only show non-profit

PLOS Computational Biology

● 1633 papers in training set

Neural Computation

● 36 papers in training set

50% of probability mass above

Proceedings of the National Academy of Sciences

● 2130 papers in training set

● 4510 papers in training set

Journal of Neural Engineering

○ 197 papers in training set

Scientific Reports

○ 3102 papers in training set

Frontiers in Neuroscience

○ 223 papers in training set

○ 813 papers in training set

Frontiers in Computational Neuroscience

○ 53 papers in training set

Neural Networks

○ 32 papers in training set

The Journal of Neuroscience

● 928 papers in training set

Network Neuroscience

● 116 papers in training set

● 389 papers in training set

Journal of Neurophysiology

● 263 papers in training set

Nature Communications

○ 4913 papers in training set

BMC Bioinformatics

○ 383 papers in training set

Journal of Neuroscience Methods

○ 106 papers in training set

● 5422 papers in training set

Neuroinformatics

○ 40 papers in training set

Chaos, Solitons & Fractals

○ 32 papers in training set

Journal of Computational Neuroscience

○ 23 papers in training set

Bulletin of Mathematical Biology

○ 84 papers in training set

Imaging Neuroscience

● 242 papers in training set