Back

The dangers of data double dipping in assessing the classification accuracies of blood biomarkers in Alzheimer's disease and related disorder research

Liu, T.; Zeng, X.; Snitz, B. E.; Karikari, T. K.; Deek, R. A.

2026-06-01 neurology
10.64898/2026.05.22.26353848 medRxiv
Show abstract

Blood biomarker models are increasingly used in Alzheimer's disease and related dementia translational research, but predictive performance can be inflated when the same dataset is used for both model development and evaluation. We assess the effect of data double dipping using simulations and NULISA proteomic data from the MYHAT-NI community-based cohort to predict brain amyloid-beta neuroimaging status. In both settings, training AUC increased as more biomarkers were added, while testing AUC peaked earlier and then declined. These findings show that data double dipping can inflate model performance and highlight the need for external validation or internal validation with data partitioning.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Alzheimer's & Dementia
143 papers in training set
Top 0.2%
28.6%
2
Journal of Proteome Research
215 papers in training set
Top 0.4%
8.7%
3
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
4
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.7%
5
PLOS ONE
4510 papers in training set
Top 37%
3.7%
6
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.5%
2.7%
50% of probability mass above
7
npj Digital Medicine
97 papers in training set
Top 2%
2.7%
8
Scientific Reports
3102 papers in training set
Top 48%
2.2%
9
npj Parkinson's Disease
89 papers in training set
Top 0.7%
2.0%
10
Nature Communications
4913 papers in training set
Top 50%
1.8%
11
eLife
5422 papers in training set
Top 40%
1.8%
12
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.7%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
14
Imaging Neuroscience
242 papers in training set
Top 3%
1.3%
15
Alzheimer's Research & Therapy
52 papers in training set
Top 1%
1.3%
16
Frontiers in Aging Neuroscience
67 papers in training set
Top 2%
1.3%
17
Brain Communications
147 papers in training set
Top 2%
1.1%
18
Scientific Data
174 papers in training set
Top 2%
0.9%
19
Frontiers in Neurology
91 papers in training set
Top 4%
0.9%
20
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.8%
21
Communications Biology
886 papers in training set
Top 22%
0.8%
22
Human Brain Mapping
295 papers in training set
Top 4%
0.8%
23
Acta Biomaterialia
85 papers in training set
Top 0.8%
0.8%
24
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.8%
25
MethodsX
14 papers in training set
Top 0.5%
0.7%
26
eBioMedicine
130 papers in training set
Top 5%
0.7%
27
Neurobiology of Disease
134 papers in training set
Top 4%
0.7%
28
Journal of Neurochemistry
50 papers in training set
Top 0.8%
0.5%
29
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.5%
30
Network Neuroscience
116 papers in training set
Top 2%
0.5%