Back

Bridging Big Data: Procedures for Combining Non-equivalent Cognitive Measures from the ENIGMA Consortium

Kennedy, E.; Vadlamani, S.; Lindsey, H. M.; Lei, P.-W.; Pugh, M. J.; Adamson, M.; Alda, M.; Alonso-Lana, S.; Ambrogi, S.; Anderson, T. J.; Arango, C.; Asarnow, R.; Avram, M.; Ayesa-Arriola, R.; Babikian, T.; Banaj, N.; Bird, L. J.; Borgwardt, S.; Brodtmann, A.; Brosch, K.; Caeyenberghs, K.; Calhoun, V. D.; Chiaravalloti, N. D.; Cifu, D. X.; Crespo-Facorro, B.; Dalrymple-Alford, J. C.; Dams-O'Connor, K.; Dannlowski, U.; Darby, D.; Davenport, N.; DeLuca, J.; Diaz-Caneja, C. M.; Disner, S. G.; Dobryakova, E.; Ehrlich, S.; Esopenko, C.; Ferrarelli, F.; Frank, L. E.; Franz, C.; Fuentes-Claramonte,

2023-01-19 neuroscience
10.1101/2023.01.16.524331 bioRxiv
Show abstract

Investigators in neuroscience have turned to Big Data to address replication and reliability issues by increasing sample sizes, statistical power, and representativeness of data. These efforts unveil new questions about integrating data arising from distinct sources and instruments. We focus on the most frequently assessed cognitive domain - memory testing - and demonstrate a process for reliable data harmonization across three common measures. We aggregated global raw data from 53 studies totaling N = 10,505 individuals. A mega-analysis was conducted using empirical bayes harmonization to remove site effects, followed by linear models adjusting for common covariates. A continuous item response theory (IRT) model estimated each individuals latent verbal learning ability while accounting for item difficulties. Harmonization significantly reduced inter-site variance while preserving covariate effects, and our conversion tool is freely available online. This demonstrates that large-scale data sharing and harmonization initiatives can address reproducibility and integration challenges across the behavioral sciences. TeaserWe present a global effort to devise harmonization procedures necessary to meaningfully leverage big data.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Nature Human Behaviour
85 papers in training set
Top 0.1%
44.3%
2
eneuro
389 papers in training set
Top 0.8%
7.3%
50% of probability mass above
3
Nature Neuroscience
216 papers in training set
Top 1%
6.8%
4
Nature Communications
4913 papers in training set
Top 27%
6.7%
5
eLife
5422 papers in training set
Top 16%
5.2%
6
Nature
575 papers in training set
Top 8%
2.6%
7
Neuron
282 papers in training set
Top 5%
2.5%
8
Nature Methods
336 papers in training set
Top 4%
2.2%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.8%
10
Scientific Data
174 papers in training set
Top 1.0%
1.8%
11
Communications Psychology
20 papers in training set
Top 0.1%
1.6%
12
Neuroscience & Biobehavioral Reviews
43 papers in training set
Top 0.5%
1.3%
13
Science Advances
1098 papers in training set
Top 24%
1.2%
14
Alzheimer's & Dementia
143 papers in training set
Top 2%
0.9%
15
Scientific Reports
3102 papers in training set
Top 69%
0.9%
16
Imaging Neuroscience
242 papers in training set
Top 3%
0.8%
17
Cell Reports
1338 papers in training set
Top 31%
0.8%
18
Human Brain Mapping
295 papers in training set
Top 4%
0.8%
19
PLOS Biology
408 papers in training set
Top 18%
0.8%
20
BMC Medicine
163 papers in training set
Top 7%
0.8%
21
PLOS Genetics
756 papers in training set
Top 14%
0.8%
22
Cerebral Cortex
357 papers in training set
Top 2%
0.8%
23
FASEB BioAdvances
15 papers in training set
Top 0.3%
0.8%
24
GigaScience
172 papers in training set
Top 3%
0.8%