Tutorial Description

We live in the age of Big Data, featuring huge image and video datasets. Despite their size, Big Data, however, cannot guarantee sufficient annotations for all possible concepts. For common object concepts, such as “ball” or a “helicopter”, annotations are easy to obtain. Yet, for more exotic concepts like a “lagerphone” (a percussion musical instrument) obtaining annotations is not straight-forward: not only the available images do not suffice, but often the annotations can be made only be experts. Going beyond single-noun concepts to composite concepts, e.g., “wooden saxophone” or “a sunny day on the mountain”, annotations are even harder to obtain as number of combinations is unbounded. Composite concepts, however, are especially relevant for real users that make spontaneous image searches, thus implicitly requesting their own, unique classifier on the spot. In the absense of object specific annotations one solution is zero-shot learning, where the combination of a) existing classifiers and b) semantic, cross-concept mappings between these classifiers allows for building novel classifiers without expecting any visual examples. In this tutorial we focus on zero-shot learning for vision and multimedia.

  1. Introduction
  2. Knowledge transfer
  3. Classification
  4. Localization
  5. Retrieval
  6. Interaction
  7. Conclusion and Discussion

Tutorial Outline

Download Slides

Lectures Biography

Photo credits: Monique KooijmansDr. Thomas Mensink received the M.Sc. degree, with honors, in artificial intelligence, from the University of Amsterdam (UvA) in 2007. He worked towards a Ph.D. jointly with the LEAR team of INRIA Grenoble and the Computer Vision team of Xerox Research Centre Europe. His PhD has received the AFRIF Thesis award 2012 for the best Ph.D. thesis in pattern recognition in France. Currently, he is post-doctoral scholar at the University of Amsterdam and received the prestigious young talent VENI award from the Netherlands Organisation for Scientific Research (NWO) in 2015. He focuses on machine learning for image and video classification. More specifically, his current research addresses learning semantic representations for visual data, which is essential when learning from few, or even zero, examples. He is lecturer of a MSc course on Visual Search Engines and have taught several classes on zero-shot learning for computer vision for different audiences. He has received numerous awards, including the ACM Multimedia Best Paper prize 2014.

Dr. Efstratios Gavves is an Assistant Professor with the University of Amsterdam in the Netherlands and Scientific Manager of the QUVA Deep Vision Lab. After Efstratios completed his PhD in 2014 at the University of Amsterdam, he worked as a Post-doctoral Researcher at the KU Leuven with Prof. Tinne Tuytelaars. He has authored several papers in major Computer Vision and Multimedia conferences and journals, including CVPR, ICCV, ECCV, IJCV, CVIU. His research interests include, but are not limited to, statistical and deep learning with applications on computer vision tasks, like object recognition, image captioning, action recognition, memory networks and recurrent networks, tracking.

Dr. Cees G.M. Snoek received the M.Sc. degree in business information systems (2000) and the Ph.D. degree in computer science (2005) both from the University of Amsterdam, The Netherlands. He is currently a director of the QUVA Lab, the joint research lab of Qualcomm and the University of Amsterdam on deep learning and computer vision. He is also a principal engineer at Qualcomm and an associate professor at the University of Amsterdam. He was previously a Visiting Scientist at Informedia, Carnegie Mellon University, USA (2003), Fulbright Junior Scholar at UC Berkeley’s Computer Vision Group (2010-2011), and head of R\&D at University spin-off Euvision Technologies before it was acquired by Qualcomm (2011-2014). His research interests focus on video and image recognition. He has published over 150 refereed book chapters, journal and conference papers, and serves on the program committee of the major conferences in multimedia and computer vision. He is general co-chair of ACM Multimedia 2016 in Amsterdam, program co-chair for ICMR 2015, co-initiator and co-organizer of the VideOlympics 2007-2009. He is a lecturer of post-doctoral courses given at international conferences and European summer schools. He is a senior member of IEEE and ACM. Dr. Snoek is recipient of an NWO Veni award (2008), a Fulbright Junior Scholarship (2010), an NWO Vidi award (2012), and the Netherlands Prize for ICT Research (2012). Several of his Ph.D. students and Post-docs have won awards, including the IEEE Transactions on Multimedia Prize Paper Award (2012) the SIGMM Best Ph.D. Thesis Award (2013), the Best Paper Award of ACM Multimedia (2014) and an NWO Veni award (2015). Three of his former mentees serve as assistant professor.