The dangers of data double dipping in assessing the classification accuracies of blood biomarkers in Alzheimer's disease and related disorder research
Liu, T.; Zeng, X.; Snitz, B. E.; Karikari, T. K.; Deek, R. A.
Show abstract
Blood biomarker models are increasingly used in Alzheimer's disease and related dementia translational research, but predictive performance can be inflated when the same dataset is used for both model development and evaluation. We assess the effect of data double dipping using simulations and NULISA proteomic data from the MYHAT-NI community-based cohort to predict brain amyloid-beta neuroimaging status. In both settings, training AUC increased as more biomarkers were added, while testing AUC peaked earlier and then declined. These findings show that data double dipping can inflate model performance and highlight the need for external validation or internal validation with data partitioning.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.