Back

The forward variable selection based optimized decoding toolbox for neuroimaging data

Dang, T.; Fermin, A. S. R.; Machizawa, M. G.

2022-12-25 neuroscience
10.1101/2022.12.25.521906 bioRxiv
Show abstract

Neuroimaging data is complex and high-dimensional that poses challenges for machine learning (ML) applications. Of varieties of reasons contributing on accuracy decoding, variable feature selection is one of crucial steps for determining target feature in data analysis, especially in the context of neuroimaging studies where the number of features is often much larger than the number of observations. Therefore, optimization of feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML algorithms. Here, we introduce an efficient ML package incorporating a forward variable selection (FVS) algorithm that optimizes the identification of features for both classification and regression models. In our framework, the best ML model and feature pairs that explain the inputs can be automatically determined. Moreover, the toolbox can be executed in a parallel environment for efficient computation. The parallelized FVS algorithm iteratively selects the best feature pair compared against the previous steps to maximize the predictive performance. The FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross validation and identifies the best subset of features based on a pre-defined criterion for each model. Furthermore, the hyperparameters of each ML model are optimized at each forward iteration. A final outcome highlights an optimized number of selected features (brain regions of interest) with decoding accuracies. Using our pipeline, we examined the effectiveness of our toolbox on an existing neuroimaging (structural MRI) dataset. Compared ML models with and without FVS approach, we demonstrate that the FVS significantly improved the accuracy of the ML algorithm over the counterpart model without FVS. Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. This oFVSD toolbox efficiently and effectively improves the performance of both classification and regression models on neuroimaging data and should be applicable to many other neuroimaging data and more. This Python package is open-source and freely available, making it a useful toolbox for neuroimaging communities seeking improvement of decoding accuracy for their datasets.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Neuroinformatics
40 papers in training set
Top 0.1%
26.2%
2
NeuroImage
813 papers in training set
Top 0.6%
18.8%
3
Human Brain Mapping
295 papers in training set
Top 1.0%
6.4%
50% of probability mass above
4
Frontiers in Neuroinformatics
38 papers in training set
Top 0.1%
4.4%
5
PLOS Computational Biology
1633 papers in training set
Top 8%
4.0%
6
PLOS ONE
4510 papers in training set
Top 46%
2.5%
7
Journal of Neuroscience Methods
106 papers in training set
Top 0.6%
2.5%
8
Frontiers in Psychiatry
83 papers in training set
Top 2%
2.1%
9
Aperture Neuro
18 papers in training set
Top 0.2%
1.8%
10
GigaScience
172 papers in training set
Top 1%
1.8%
11
Imaging Neuroscience
242 papers in training set
Top 2%
1.8%
12
Scientific Reports
3102 papers in training set
Top 57%
1.7%
13
Medical Image Analysis
33 papers in training set
Top 0.6%
1.7%
14
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
15
Frontiers in Neuroimaging
11 papers in training set
Top 0.2%
1.5%
16
Artificial Intelligence in Medicine
15 papers in training set
Top 0.4%
1.3%
17
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
1.0%
18
Network Neuroscience
116 papers in training set
Top 0.9%
0.9%
19
Frontiers in Human Neuroscience
67 papers in training set
Top 2%
0.9%
20
Journal of Neural Engineering
197 papers in training set
Top 2%
0.8%
21
Neurocomputing
13 papers in training set
Top 0.6%
0.8%
22
Frontiers in Neuroscience
223 papers in training set
Top 8%
0.7%
23
Communications Biology
886 papers in training set
Top 28%
0.7%
24
SoftwareX
15 papers in training set
Top 0.6%
0.5%
25
eneuro
389 papers in training set
Top 11%
0.5%
26
Developmental Cognitive Neuroscience
81 papers in training set
Top 0.7%
0.5%
27
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.5%
28
Nature Communications
4913 papers in training set
Top 67%
0.5%