Seeing Just Enough: The Contribution of Hands, Objects and Visual Features to Egocentric Action Recognition
Rybansky, F.; Rahmaniboldaji, S.; Gilbert, A.; Guerin, F.; Hurlbert, A. C.; Vuong, Q. C.
Show abstract
Humans recognize everyday actions without conscious effort despite challenges such as poor viewing conditions and visual similarity between actions. Yet the visual features contributing to action recognition remain unclear. To address this, we combined semantic modelling and feature reduction methods to identify critical features for recognizing actions from challenging egocentric perspectives. We first identified egocentric action videos from home environments that a motion-focused action classification network could correctly classify (Easy videos) or not (Hard videos). In Experiment 1, participants (N=136) labelled the action and object in the videos. Using a language model framework, we derived human ground truth labels for each video and quantified its recognition consistency based on semantic similarity. Participants recognized actions and objects in Easy videos more consistently than in Hard videos. In Experiment 2, we recursively reduced the Easy and Hard videos with high recognition consistency to extract minimal recognizable configurations (MIRCs), in which any further spatial or temporal reductions disrupted recognition. The data was collected using a large-scale online study (N=4360). We extracted information related to the hand, objects, scene background and visual features (e.g., orientation or motion signals) from the 474 MIRCs. Binary classification showed that recognition was disrupted when regions containing the manipulated object and strong orientation signals were removed, while temporal reduction by frame-scrambling disrupted recognition in 73% of MIRCs. The active hand had some marginal contribution. Our results highlight the importance of both mid- and high-level information for egocentric action recognition and link hierarchical feature theories with naturalistic human perception.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.