Back

Evidence of Unreliable Data and Poor Data Provenance in Clinical Prediction Model Research and Clinical Practice

2026-02-26 health systems and quality improvement Title + abstract only
View on medRxiv
Show abstract

Clinical prediction models are often created using large routinely collected datasets. It is essential that prediction models are developed with appropriate data and methods and transparently reported to ensure that decisions are based on reliable predictions. Kaggle is a popular competition website where users learn and apply analysis skills on a range of datasets. We identified two large, publicly available Kaggle datasets, on stroke and diabetes, that lack clear data provenance, but are widel...

Predicted journal destinations