Back

Seeing Just Enough: The Contribution of Hands, Objects and Visual Features to Egocentric Action Recognition

Rybansky, F.; Rahmaniboldaji, S.; Gilbert, A.; Guerin, F.; Hurlbert, A. C.; Vuong, Q. C.

2026-02-17 neuroscience
10.64898/2026.02.15.705896 bioRxiv
Show abstract

Humans recognize everyday actions without conscious effort despite challenges such as poor viewing conditions and visual similarity between actions. Yet the visual features contributing to action recognition remain unclear. To address this, we combined semantic modelling and feature reduction methods to identify critical features for recognizing actions from challenging egocentric perspectives. We first identified egocentric action videos from home environments that a motion-focused action classification network could correctly classify (Easy videos) or not (Hard videos). In Experiment 1, participants (N=136) labelled the action and object in the videos. Using a language model framework, we derived human ground truth labels for each video and quantified its recognition consistency based on semantic similarity. Participants recognized actions and objects in Easy videos more consistently than in Hard videos. In Experiment 2, we recursively reduced the Easy and Hard videos with high recognition consistency to extract minimal recognizable configurations (MIRCs), in which any further spatial or temporal reductions disrupted recognition. The data was collected using a large-scale online study (N=4360). We extracted information related to the hand, objects, scene background and visual features (e.g., orientation or motion signals) from the 474 MIRCs. Binary classification showed that recognition was disrupted when regions containing the manipulated object and strong orientation signals were removed, while temporal reduction by frame-scrambling disrupted recognition in 73% of MIRCs. The active hand had some marginal contribution. Our results highlight the importance of both mid- and high-level information for egocentric action recognition and link hierarchical feature theories with naturalistic human perception.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.8%
22.0%
2
Nature Communications
4913 papers in training set
Top 10%
14.4%
3
Scientific Reports
3102 papers in training set
Top 7%
9.9%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 8%
8.2%
50% of probability mass above
5
Nature Human Behaviour
85 papers in training set
Top 0.6%
4.7%
6
Communications Psychology
20 papers in training set
Top 0.1%
4.2%
7
Cognition
44 papers in training set
Top 0.2%
3.0%
8
Nature
575 papers in training set
Top 8%
2.7%
9
PLOS ONE
4510 papers in training set
Top 49%
2.0%
10
Science Advances
1098 papers in training set
Top 15%
1.8%
11
eneuro
389 papers in training set
Top 5%
1.7%
12
eLife
5422 papers in training set
Top 40%
1.7%
13
Frontiers in Computational Neuroscience
53 papers in training set
Top 1%
1.6%
14
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.5%
15
Communications Biology
886 papers in training set
Top 11%
1.5%
16
Advanced Science
249 papers in training set
Top 13%
1.3%
17
iScience
1063 papers in training set
Top 20%
1.3%
18
Science
429 papers in training set
Top 19%
0.8%
19
Journal of Vision
92 papers in training set
Top 0.5%
0.7%
20
Neural Networks
32 papers in training set
Top 0.8%
0.7%
21
Neuroscience of Consciousness
12 papers in training set
Top 0.4%
0.7%
22
The Journal of Neuroscience
928 papers in training set
Top 9%
0.7%
23
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
24
Nature Neuroscience
216 papers in training set
Top 7%
0.6%