Josie (Jiaojiao) Zhao

I am currently a final-year PhD candidate on video understanding under the supervision of Prof. Cees Snoek, in the VIS Lab, University of Amsterdam, The Netherlands.

Previously, I worked as a visiting researcher under the supervision of Prof. Ling Shao in the Artificial Intelligence Lab, School of Computing Sciences, University of East Anglia, UK. And from 2015 to 2016, I worked in Panasonic R&D Center Singapore and visited the Learning and Vision Group, ECE, National University of Singapore. I received my Master's Degree and Bachelor's Degree under the supervision of Prof. Maoguo Gong in the Department of Electronic Engineering, XiDian University, China.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo

2022/06. I attend CVPR 2022 in New Orleans and present our paper TubeR for video action detection based on transformer as an oral.

2022/04. I have my PhD defence at Promoties Agnietenkapel in Amsterdam.

2022/03. Our paper on TubeR for video action detection based on transformer has been accepted as oral in CVPR 2022.

2021/01. Our paper on Liftpool for image classification and segmentation has been accepted in ICLR 2021.

2020/06. I have joined Amazon Rekognition as a reseacher intern.

2019/10. I have joined Kepler Vision Technologies as a reseacher assistant.

2019/06. Our paper on pixelated semantic colorization has been accepted in IJCV.

2019/03. Our paper on action detection with two-in-one stream has been accepted in CVPR 2019.

2018/08. Our paper on image colorization based on pixel-level semantics has been accepted as oral in BMVC 2018.

2017/12. I have joined in VIS Lab as a PhD at UVA.


2022/09. I give a talk on "Spatio-temporal modeling for video understanding" at Meta AI.

2022/07. I give a talk on "Spatio-temporal modeling for video understanding" at Google Research.

2022/02. I give a talk on "Spatio-temporal action detection in videos" at Qualcomm AI Research.

2021/03. I attend Google International Women's Day with students.

2020/10. I give a talk on "Tubelet action detection in videos" at Amazon Rekognition.

2019/09. I attend BMVA Symposium on Video Understanding 2019 in London.

2019/06. Our paper on action detection with two-in-one stream is presented at Facebook's AI video summit.

2018/09. I attend Computer Vision Summit 2018 at Google in Zurich.


I'm interested in computer vision and deep learning. During my PhD, I am mainly focusing on video understanding, specifically video action detection, action recognition, video object segmentation. I am also excited about getting inspiration from classical signal/image processing methodologies for addressing some basic problems in deep learning. Besides, I have worked on image colorization, face recognition/detection/aligment, and change detection.

TubeR: Tube-Transformer for Action Detection
Jiaojiao Zhao, Xinyu Li, Chunhui Liu, Shuai Bing,
Hao Chen, Cees Snoek, Joseph Tighe
CVPR, 2022   (Oral Presentation)

Has developped the first Transformer-based end-to-end action tube detection framework.

LiftPool: Bidrectional ConvNet Pooling
Jiaojiao Zhao, Cees Snoek
ICLR, 2021

By adopting the philosophy of the classical Lifting Scheme from signal processing, we propose LiftPool for bidirectional pooling layers, including LiftDownPool and LiftUpPool.

Go with the Flow: Aligned 3D Convolutions for Video Action Recognition
Jiaojiao Zhao, Cees Snoek

We present aligned 3D convolution blocks, which collect the valuable information from the locations aligned by the learned offsets rather than the original dislocated positions.

Pixelated Semantic Colorization
Jiaojiao Zhao, Jungong Han, Ling Shao, Cees Snoek
IJCV, 2019

We propose to exploit pixelated object semantics to guide image colorization.

Dance with Flow: Two-in-One Stream Action Detection
Jiaojiao Zhao, Cees Snoek
CVPR, 2019
project page / arXiv

We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers for video action detection.

Pixel-level semantics guided image colorization
Jiaojiao Zhao, Li Liu, Cees Snoek, Jungong Han, Ling Shao
BMVC, 2018   (Oral Presentation)

To address context confusion, we propose to incorporate the pixel-level object semantics to guide the image colorization.

Unconstrained Face Recognition Using A Set-to-Set Distance Measure
Jiaojiao Zhao, Jungong Han, Ling Shao
TCSVT, 2017

We propose a novel set-to-set (S2S) distance measure to calculate the similarity between two sets with the aim to improve the recognition accuracy for faces with real-world challenges, such as extreme poses or severe illumination conditions.

Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks
Maoguo Gong, Jiaojiao Zhao, Jia Liu, Qiguang Miao, Licheng Jiao,
TNNLS, 2015

First work applying deep learning to change detection for synthetic aperture radar images.

Difference representation learning using stacked restricted Boltzmann machines for change detection in SAR images
Jia Liu, Maoguo Gong, Jiaojiao Zhao, Hao Li, Licheng Jiao,
Soft Compute, 2016

We establish a deep neural network using stacked Restricted Boltzmann Machines (RBMs) to analyze the difference images and detect changes between multitemporal synthetic aperture radar (SAR) images.

Deep learning to classify difference image for image change detection
Jiaojiao Zhao, Jia Liu, Maoguo Gong, Licheng Jiao,
IJCNN, 2014

We propose a novel difference image analysis approach based on deep neural networks for image change detection problems.

Academic Service
Teaching Assistance
Computer Vision 1,    MSc Artificial Intelligence, 2018
Computer Vision 2,    MSc Artificial Intelligence, 2019
Deep Learning,    MSc Artificial Intelligence, 2019
Deep Learning,    MSc Artificial Intelligence, 2020
Big Data,    MSc Information Studies, 2019

Source code for this website is from here.