What do 15,000 object categories tell us about classifying and localizing actions?

What do 15,000 object categories tell us about classifying and localizing actions?

CVPR 2015
Mihir Jain, Jan van Gemert and Cees Snoek
University of Amsterdam, Qualcomm Research Netherlands


Contributions

In our paper, we present an emperical study on the benefit of encoding objects for action recognition in videos. We conclude the following:
  • Objects matter for actions.

  • Action have object preference.

  • Object-action relations can be transferred from one dataset to another.

  • Objects improve state-of-the-art for action classification and action localization.

Download

We make our object representation and other details available for download:

If you have any questions, please send me an e-mail at m.jain@uva.nl

Action classification results

Method UCF101 THUMOS14 Validation THUMOS14 Test Hollywood2 HMDB51
Objects 65.6% 49.7% 56.4% 38.4% 38.9%
Motion 84.2% 56.9% 63.1% 64.6% 57.9%
Objects + Motion 88.1% 66.8% 70.8% 66.2% 61.1%
Objects (R=1) + Motion 88.0% 66.7% 68.4% 66.6% 61.0%
Objects (R=11) + Motion 88.5% 68.8% 71.6% 66.4% 61.4%
Objects + Stacked FVs -- -- -- -- 71.3%
Best reported in literature till CVPR'15 87.7%
Peng et al. ECCV'14
66.8%
Jain et al. THUMOS'14
71.0%
Jain et al. THUMOS'14
73.7%
Fernando et al. CVPR'15
66.8%
Peng et al. ECCV'14

Action localization results

Datasets