Interpretable AI for Accelerated Video-Based Surgical Skill Assessment: A Highlights-Reel Approach
Lafouti, M.; Feldman, L. S.; Hooshiar, A.
Show abstract
BackgroundManual video-based evaluation of surgical skills can be time-consuming and delays trainee feedback. Artificial intelligence (AI) offers opportunities to automate aspects of assessment while maintaining clinician oversight. We developed an interpretable spatiotemporal model that classifies surgical expertise directly from endoscopic video in standardized training tasks and generates saliency-based "highlights reels" showing the most influential frames. MethodsAn RGB pipeline combining InceptionV3 for spatial feature extraction and a gated recurrent unit (GRU) for temporal modeling was trained on the JIGSAWS dataset. The model outputs novice, intermediate, or expert labels. A rolling-window, low-latency evaluation at 30 fps with a stride of 10 frames was used. A motion-augmented variant fused RGB with optical-flow features. Spatial and temporal saliency maps highlighted key decision-making regions. ResultsThe RGB model achieved 95% accuracy (F1: 92% expert, 86% intermediate, 99% novice). Performance was strongest for novice and expert trials, while intermediate trials showed the lowest recall, consistent with greater ambiguity around the intermediate skill level. Saliency maps consistently emphasized tool-tissue interactions and peaked during technically demanding phases. The optical-flow variant underperformed, approximately 38% accuracy, which may reflect sensitivity to global camera motion and other non-informative motion patterns. ConclusionsThis interpretable AI pipeline accurately classifies surgical skill while producing intuitive visual highlights. Future work will refine highlight thresholds and validate on laparoscopic inguinal hernia repair for realworld deployment.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.