Back

An evaluation of reproducibility and errors in published sample size calculations performed using G*Power

Thibault, R. T.; Zavalis, E. A.; Malicki, M.; Pedder, H.

2024-08-05 health systems and quality improvement
10.1101/2024.07.15.24310458 medRxiv
Show abstract

BackgroundPublished studies in the life and health sciences often employ sample sizes that are too small to detect realistic effect sizes. This shortcoming increases the rate of false positives and false negatives, giving rise to a potentially misleading scientific record. To address this shortcoming, many researchers now use point-and-click software to run sample size calculations. ObjectiveWe aimed to (1) estimate how many published articles report using the G*Power sample size calculation software; (2) assess whether these calculations are reproducible and (3) error-free; and (4) assess how often these calculations use G*Powers default option for mixed-design ANOVAs--which can be misleading and output sample sizes that are too small for a researchers intended purpose. MethodWe randomly sampled open access articles from PubMed Central published between 2017 and 2022 and used a coding form to manually assess 95 sample size calculations for reproducibility and errors. ResultsWe estimate that more than 48,000 articles published between 2017 and 2022 and indexed in PubMed Central or PubMed report using G*Power (i.e., 0.65% [95% CI: 0.62% - 0.67%] of articles). We could reproduce 2% (2/95) of the sample size calculations without making any assumptions, and likely reproduce another 28% (27/95) after making assumptions. Many calculations were not reported transparently enough to assess whether an error was present (75%; 71/95) or whether the sample size calculation was for a statistical test that appeared in the results section of the publication (48%; 46/95). Few articles that performed a calculation for a mixed-design ANOVA unambiguously selected the non-default option (8%; 3/36). ConclusionPublished sample size calculations that use G*Power are not transparently reported and may not be well-informed. Given the popularity of software packages like G*Power, they present an intervention point to increase the prevalence of informative sample size calculations.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
26.5%
2
PLOS ONE
4510 papers in training set
Top 8%
19.1%
3
Research Synthesis Methods
20 papers in training set
Top 0.1%
7.0%
50% of probability mass above
4
Trials
25 papers in training set
Top 0.2%
6.5%
5
BMJ Open
554 papers in training set
Top 3%
6.5%
6
BMC Medical Research Methodology
43 papers in training set
Top 0.4%
2.7%
7
F1000Research
79 papers in training set
Top 0.9%
2.4%
8
PLOS Biology
408 papers in training set
Top 6%
2.1%
9
BMJ Global Health
98 papers in training set
Top 1%
2.1%
10
Medicine
30 papers in training set
Top 0.9%
1.9%
11
JAMA Network Open
127 papers in training set
Top 2%
1.8%
12
JMIR Research Protocols
18 papers in training set
Top 0.9%
1.3%
13
Journal of Public Health
23 papers in training set
Top 0.8%
0.9%
14
International Journal of Epidemiology
74 papers in training set
Top 2%
0.8%
15
BMC Medicine
163 papers in training set
Top 6%
0.8%
16
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
17
Healthcare
16 papers in training set
Top 2%
0.8%
18
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
19
Nature Communications
4913 papers in training set
Top 62%
0.8%
20
Developmental Cognitive Neuroscience
81 papers in training set
Top 0.5%
0.8%
21
Scientific Reports
3102 papers in training set
Top 75%
0.7%
22
International Journal of Environmental Research and Public Health
124 papers in training set
Top 7%
0.7%
23
PeerJ
261 papers in training set
Top 17%
0.7%
24
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
25
Medical Decision Making
10 papers in training set
Top 0.4%
0.5%
26
JMIRx Med
31 papers in training set
Top 3%
0.5%
27
Systematic Reviews
11 papers in training set
Top 0.7%
0.5%
28
BMJ Open Quality
15 papers in training set
Top 1.0%
0.5%