Back

Preventing Data Leakage in Neural Decoding

Wong, R.; Zhu, S. I.; McCullough, M. H.; Goodhill, G. J.

2026-01-27 neuroscience
10.64898/2026.01.26.701583 bioRxiv
Show abstract

Neural decoding is a widely-used machine learning technique for investigating how behavior, perception and cognition are represented in neural activity. However without careful application data leakage can occur, where information from the test set contaminates the training set, leading to biased estimates of decoding performance and potentially invalidating biological conclusions. Here we use simulated and biological datasets to demonstrate how both supervised and unsupervised data preprocessing, including dimensionality reduction, can introduce leakage in neural decoding studies. We reveal that in some cases leakage can paradoxically decrease decoding performance relative to unbiased estimates, and we provide theoretical analyses explaining how this occurs. We demonstrate that, for autocorrelated neural time series, standard k-fold cross-validation can dramatically overstate performance. Finally we provide detailed recommendations for avoiding data leakage in neural decoding.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.1%
37.4%
2
Neural Computation
36 papers in training set
Top 0.1%
14.8%
50% of probability mass above
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 11%
6.4%
4
PLOS ONE
4510 papers in training set
Top 36%
4.0%
5
Journal of Neural Engineering
197 papers in training set
Top 0.7%
3.7%
6
Scientific Reports
3102 papers in training set
Top 36%
3.6%
7
Frontiers in Neuroscience
223 papers in training set
Top 4%
1.5%
8
NeuroImage
813 papers in training set
Top 4%
1.3%
9
Frontiers in Computational Neuroscience
53 papers in training set
Top 1%
1.3%
10
Neural Networks
32 papers in training set
Top 0.5%
1.3%
11
The Journal of Neuroscience
928 papers in training set
Top 7%
1.2%
12
Network Neuroscience
116 papers in training set
Top 0.8%
1.2%
13
eneuro
389 papers in training set
Top 7%
1.2%
14
Journal of Neurophysiology
263 papers in training set
Top 0.6%
1.2%
15
Nature Communications
4913 papers in training set
Top 58%
1.0%
16
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
17
Journal of Neuroscience Methods
106 papers in training set
Top 1%
0.9%
18
eLife
5422 papers in training set
Top 53%
0.9%
19
Neuroinformatics
40 papers in training set
Top 0.9%
0.8%
20
Chaos, Solitons & Fractals
32 papers in training set
Top 2%
0.7%
21
Journal of Computational Neuroscience
23 papers in training set
Top 0.4%
0.7%
22
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.6%
23
Imaging Neuroscience
242 papers in training set
Top 4%
0.6%