An evaluation of reproducibility and errors in published sample size calculations performed using G*Power

Thibault, R. T.; Zavalis, E. A.; Malicki, M.; Pedder, H.

2024-08-05 health systems and quality improvement

10.1101/2024.07.15.24310458 medRxiv

Show abstract

BackgroundPublished studies in the life and health sciences often employ sample sizes that are too small to detect realistic effect sizes. This shortcoming increases the rate of false positives and false negatives, giving rise to a potentially misleading scientific record. To address this shortcoming, many researchers now use point-and-click software to run sample size calculations. ObjectiveWe aimed to (1) estimate how many published articles report using the G*Power sample size calculation software; (2) assess whether these calculations are reproducible and (3) error-free; and (4) assess how often these calculations use G*Powers default option for mixed-design ANOVAs--which can be misleading and output sample sizes that are too small for a researchers intended purpose. MethodWe randomly sampled open access articles from PubMed Central published between 2017 and 2022 and used a coding form to manually assess 95 sample size calculations for reproducibility and errors. ResultsWe estimate that more than 48,000 articles published between 2017 and 2022 and indexed in PubMed Central or PubMed report using G*Power (i.e., 0.65% [95% CI: 0.62% - 0.67%] of articles). We could reproduce 2% (2/95) of the sample size calculations without making any assumptions, and likely reproduce another 28% (27/95) after making assumptions. Many calculations were not reported transparently enough to assess whether an error was present (75%; 71/95) or whether the sample size calculation was for a statistical test that appeared in the results section of the publication (48%; 46/95). Few articles that performed a calculation for a mixed-design ANOVA unambiguously selected the non-default option (8%; 3/36). ConclusionPublished sample size calculations that use G*Power are not transparently reported and may not be well-informed. Given the popularity of software packages like G*Power, they present an intervention point to increase the prevalence of informative sample size calculations.

An evaluation of reproducibility and errors in published sample size calculations performed using G*Power

Matching journals