Back

iSparse kmeans: a two-step clustering approach for big dynamic functional network connectivity data

Sendi, M. S. E.; Salat, D.; Miller, R.; Calhoun, V.

2022-03-16 bioinformatics
10.1101/2022.03.13.484193 bioRxiv
Show abstract

BackgroundDynamic functional network connectivity (dFNC) estimated from resting-state functional magnetic imaging (rs-fMRI) studies the temporally varying of functional integration between brain networks. In a typical dFNC pipeline, a clustering stage to summarize the connectivity patterns that are transiently but reliably realized over the course of a scanning session. However, identifying the right number of clusters through a conventional clustering criterion computed by running the algorithm repeatedly, over a large range of cluster numbers is time-consuming and requires substantial computational power even for typical dFNC datasets, and the computational demands become prohibitive as datasets become larger and scans longer. Here we developed a new dFNC pipeline, called iterative sparse kmeans or iSparse kmeans, to analyze large dFNC data without having access to huge computational power. MethodIn iSparse kmeans, we implement two-step clustering. In the first step, we randomly use a sub-sample dFNC data and identify several sets of states at different model orders. In the second step, we aggregate all dFNC states estimated from all iterations in the first step and use this to identify the optimum number of clusters using the elbow criteria. Additionally, we use this new reduced dataset and estimate a final set of states by performing a second kmeans clustering on the aggregated dFNC states from the first k-means clustering. To validate the reproducibility of iSparse kmeans, we analyzed four dFNC datasets from the human connectome project (HCP). ResultsWe found that both conventional kmeans and iSparse kmeans generate similar brain dFNC states while iSparse kmeans is 27 times faster than the traditional method in finding the optimum number of clusters. We show that the results are replicated across four different datasets from HCP. ConclusionWe developed a new analytic pipeline which facilitates analysis of large dFNC datasets without having access to a huge computational power source. We validated the reproducibility of the result across multiple datasets.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 3%
9.9%
2
BMC Bioinformatics
383 papers in training set
Top 1.0%
9.9%
3
Brain Connectivity
22 papers in training set
Top 0.1%
9.0%
4
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.2%
6.7%
5
Neuroinformatics
40 papers in training set
Top 0.1%
6.7%
6
PLOS ONE
4510 papers in training set
Top 35%
4.1%
7
SoftwareX
15 papers in training set
Top 0.1%
3.9%
50% of probability mass above
8
NeuroImage
813 papers in training set
Top 3%
3.6%
9
Scientific Reports
3102 papers in training set
Top 38%
3.5%
10
PeerJ
261 papers in training set
Top 3%
3.5%
11
Human Brain Mapping
295 papers in training set
Top 2%
3.2%
12
Network Neuroscience
116 papers in training set
Top 0.4%
2.8%
13
F1000Research
79 papers in training set
Top 0.9%
2.3%
14
GigaScience
172 papers in training set
Top 1%
1.7%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
16
Frontiers in Neuroscience
223 papers in training set
Top 4%
1.7%
17
IEEE Access
31 papers in training set
Top 0.4%
1.7%
18
Frontiers in Genetics
197 papers in training set
Top 7%
1.2%
19
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
20
Journal of Neural Engineering
197 papers in training set
Top 1%
1.2%
21
Frontiers in Human Neuroscience
67 papers in training set
Top 2%
0.9%
22
Bioengineering
24 papers in training set
Top 0.9%
0.9%
23
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
24
Artificial Intelligence in Medicine
15 papers in training set
Top 0.7%
0.7%
25
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
26
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
27
Journal of Neuroscience Methods
106 papers in training set
Top 2%
0.7%
28
Frontiers in Neuroinformatics
38 papers in training set
Top 1.0%
0.6%
29
Aperture Neuro
18 papers in training set
Top 0.4%
0.6%
30
Imaging Neuroscience
242 papers in training set
Top 4%
0.6%