We live in the age of Big Data, featuring huge image and video datasets. Despite their size, however, we cannot guarantee sufficient annotations for all possible concepts. Moreover while annotations are easy to obtain for common object concepts, such as ball or helicopter, this is not straightforward for more exotic concepts like a “lagerphone” (a percussion musical instrument): not only the available images do not suffice, but often the annotations can be made only be experts. In the absence of annotations we promote zero-shot learning, where the combination of a) existing classifiers and b) semantic, cross-concept mappings between these classifiers allows for building novel classifiers without resorting to any visual examples. From a more philosophical point-of-view zero-shot learning relates to the ability to “learn new things” and to “reason over what is learned”. While a DeepNet can reason (almost) perfectly over the 1,000 concepts it is trained on, it can not reason over any new concept, nor explain novel concepts in terms of what is already known. In this tutorial we focus on zero-shot learning for Computer Vision.
We start the tutorial with an in-depth discussion of visual knowledge transfer, followed by discussing different application domains for zero-shot learning, such as classification, localisation, retrieval, and interaction. While these applications have been studied from different communities (machine learning, computer vision, and multimedia), the future progress is when these insights are combined.
Dr. Thomas Mensink is an Assistant Professor with the University of Amsterdam (UvA) in the area of 3D Deep Learninng. He received the M.Sc. degree, with honors, in artificial intelligence, from the University of Amsterdam (UvA) in 2007. He worked towards a Ph.D. jointly with the LEAR team of INRIA Grenoble and the Computer Vision team of Xerox Research Centre Europe. His PhD has received the AFRIF Thesis award 2012 for the best Ph.D. thesis in pattern recognition in France. He focuses on machine learning for 3D classification, with a keen interest on learning semantic representations for visual data. This is essential when learning from few, or even zero, examples. He has received numerous awards, including the ACM Multimedia Best Paper prize 2014, NWO Veni grant 2015, and ACM ICMR Best Paper prize 2016.
Dr. Efstratios Gavves is an Assistant Professor with the University of Amsterdam in the Netherlands and Scientific Manager of the QUVA Deep Vision Lab. After Efstratios completed his PhD in 2014 at the University of Amsterdam, he worked as a Post-doctoral Researcher at the KU Leuven with Prof. Tinne Tuytelaars. He has authored several papers in major Computer Vision and Multimedia conferences and journals, including CVPR, ICCV, ECCV, IJCV, CVIU. His research interests include, but are not limited to, statistical and deep learning with applications on computer vision tasks, like object recognition, image captioning, action recognition, memory networks and recurrent networks, tracking.
Dr. Zeynep Akata is an Assistant Professor with the University of Amsterdam in the Netherlands, Scientific Manager of the Delta Lab and a Senior Researcher at the Max Planck Institute for Informatics in Germany. Zeynep holds a MSc degree received in 2011 from RWTH Aachen and a PhD degree received in 2014 from University of Grenoble. After she completed her PhD at the INRIA Rhone Alpes with Prof. Cordelia Schmid, she worked as a Post-doctoral Researcher at the Max Planck Institute for Informatics with Prof. Bernt Schiele. She has authored several papers in major Computer Vision and Machine Learning conferences and journals, including CVPR, ECCV, NIPS, ICML and TPAMI. Her research interests include machine learning with applications to computer vision, such as zero-shot learning and multimodal deep learning with generative models that combine vision and language. She received Lise-Meitner Award for Excellent Women in Computer Science from Max Planck Society in 2014 and a DARPA grant Explainable Artificial Intelligence in 2017 when she was a visiting scholar at Prof. Trevor Darrell’s group at the University of California Berkeley.
Dr. Cees G.M. Snoek received the M.Sc. degree in business information systems (2000) and the Ph.D. degree in computer science (2005) both from the University of Amsterdam, The Netherlands. He is currently a director of the QUVA Lab, the joint research lab of Qualcomm and the University of Amsterdam on deep learning and computer vision. He is also a principal engineer at Qualcomm and an associate professor at the University of Amsterdam. He was previously a Visiting Scientist at Informedia, Carnegie Mellon University, USA (2003), Fulbright Junior Scholar at UC Berkeley’s Computer Vision Group (2010-2011), and head of R\&D at University spin-off Euvision Technologies before it was acquired by Qualcomm (2011-2014). His research interests focus on video and image recognition. He has published over 150 refereed book chapters, journal and conference papers, and serves on the program committee of the major conferences in multimedia and computer vision. He is general co-chair of ACM Multimedia 2016 in Amsterdam, program co-chair for ICMR 2015, co-initiator and co-organizer of the VideOlympics 2007-2009. He is a lecturer of post-doctoral courses given at international conferences and European summer schools. He is a senior member of IEEE and ACM. Dr. Snoek is recipient of an NWO Veni award (2008), a Fulbright Junior Scholarship (2010), an NWO Vidi award (2012), and the Netherlands Prize for ICT Research (2012). Several of his Ph.D. students and Post-docs have won awards, including the IEEE Transactions on Multimedia Prize Paper Award (2012) the SIGMM Best Ph.D. Thesis Award (2013), the Best Paper Award of ACM Multimedia (2014) and an NWO Veni award (2015). Three of his former mentees serve as assistant professor.