Back

Agreeability testing of AMSTAR-PF, a tool for quality appraisal of systematic reviews of prognostic factor studies

Henry, M.; O'Connell, N.; Riley, R.; Moons, K.; Shea, B.; Hooft, L.; Wallwork, S.; Damen, J.; Skoetz, N.; Appiah, R.; Berryman, C.; Crouch, S.; Ferencz, G.; Grant, A.; Henry, K.; Herman, A.; Karran, E.; Koralegedera, I.; Leake, H.; MacIntyre, E.; Mouatt, B.; Phuentsho, K.; Van Der Laan, D.; Welsby, E.; Wiles, L.; Wilkinson, E.; Wilson, M.; Wilson, M.; Moseley, L.

2025-04-14 epidemiology

10.1101/2025.04.10.25325555 medRxiv

Show abstract

BackgroundThis paper details initial testing of the agreeability and usability of a novel quality appraisal tool for systematic reviews of prognostic factor studies: AMSTAR-PF. MethodsFourteen appraisers each assessed eight systematic reviews using AMSTAR-PF. Their ratings for each question and each article were compared, with interrater, inter-pair and intrapair agreeability calculated using Gwets agreement coefficient. Time of use and time to reach consensus were also recorded. ResultsInterrater agreement averaged 0.59 (range, 0.21-0.90), inter-pair 0.61 (range 0.24-0.91) and intrapair 0.75 (range 0.45-0.95) across the domains, with agreement for the overall rating 0.46 (95%CI 0.30-0.62) for interrater, 0.46 (95%CI 0.17-0.74) for inter-pair, and 0.68 (range of averages 0.22-1.00) for intrapair agreement. The majority (60.7%) of intrapair ratings were identical, with 94.6% of final ratings either identical or only one category different for the overall appraisal. The time taken to appraise a study with AMSTAR-PF improved with use and averaged around 34 minutes after the first two appraisals. ConclusionsDespite some variance in agreeability for different domains and between different appraisers, the testing results suggest that AMSTAR-PF has clear utility for appraising the quality of systematic reviews of prognostic factor studies.

Published in BMJ Open (predicted rank #4) · training set