Novel Autosegmentation Spatial Similarity Metrics Capture the Time Required to Correct Segmentations Better than Traditional Metrics in a Thoracic Cavity Segmentation Workflow
Kiser, K.; Barman, A.; Stieb, S.; Fuller, C. D.; Giancardo, L.
Show abstract
IntroductionAutomated segmentation templates can save clinicians time compared to de novo segmentation but may still take substantial time to review and correct. It has not been thoroughly investigated which automated segmentation-corrected segmentation similarity metrics best predict clinician correction time. Materials and MethodsBilateral thoracic cavity volumes in 329 CT scans were segmented by a UNet-inspired deep learning segmentation tool and subsequently corrected by a fourth-year medical student. Eight spatial similarity metrics were calculated between the automated and corrected segmentations and associated with correction times using Spearmans rank correlation coefficients. Nine clinical variables were also associated with metrics and correction times using Spearmans rank correlation coefficients or Mann-Whitney U tests. ResultsThe added path length, false negative path length, and surface Dice similarity coefficient correlated better with correction time than traditional metrics, including the popular volumetric Dice similarity coefficient (respectively {rho} = 0.69, {rho} = 0.65, {rho} = -0.48 versus {rho} = -0.25; correlation p values < 0.001). Clinical variables poorly represented in the autosegmentation tools training data were often associated with decreased accuracy but not necessarily with prolonged correction time. DiscussionMetrics used to develop and evaluate autosegmentation tools should correlate with clinical time saved. To our knowledge, this is only the second investigation of which metrics correlate with time saved. Validation of our findings is indicated in other anatomic sites and clinical workflows. ConclusionNovel spatial similarity metrics may be preferable to traditional metrics for developing and evaluating autosegmentation tools that are intended to save clinicians time.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.