Objects2action: Classifying and localizing actions without any video example
Jan van Gemert,
Thomas Mensink and
University of Amsterdam, Qualcomm Research Netherlands
In our paper, we present objects2action, a semantic embedding to classify actions in videos without using any video data or action annotations as prior knowledge. Instead it relies on commonly available object annotations, images and textual descriptions.
Our semantic embedding has three main characteristics to accommodate for the specifics of actions:
A mechanism to exploit multiple-word descriptions of actions and ImageNet objects.
It incorporates the automated selection of the most responsive objects per action.
Readily adaptable for zero-shot approach to action classification and spatiotemporal localization of actions.
We make our word2vec embeddings, proposed embedding of object and action class labels and object representation of videos available for download:
- Word2vec model trained on YFCC100M:
- Embedded object class labels:
- Embedded action class labels:
- Object representation for:
- Models (Alexnet was used for this paper):
If you have any questions, please send me an e-mail at firstname.lastname@example.org