Shaodi YOU - Publications


			Physcis Based Vision Liquid optics is extremely challenging because it is transparent and non-rigid. The transparency means its appearance is totally determined by its environment. The non-rigidity means its shape is also highly dependent on the environment. Specifically, my research focus on rain and water drops. Non-local Intrinsic Decomposition with Near-infrared Priors Appeared in ICCV2019 Ziang Cheng, Yinqiang Zheng, Shaodi You and Imari Sato Intrinsic image decomposition is a highly underconstrained problem that has been extensively studied by computer vision researchers. Previous methods impose additional constraints by exploiting either empirical or datadriven priors. In this paper, we revisit intrinsic image decomposition with the aid of near-infrared (NIR) imagery. We show that NIR band is considerably less sensitive to textures and can be exploited to reduce ambiguity caused by reflectance variation, promoting a simple yet powerful prior for shading smoothness. With this observation, we formulate intrinsic decomposition as an energy minimisation problem. Unlike existing methods, our energy formulation decouples reflectance and shading estimation, into a convex local shading component based on NIR-RGB image pair, and a reflectance component that encourages reflectance homogeneity both locally and globally. We further show the minimisation process can be approximated by a series of multi-dimensional convolutions, each within linear time complexity. To validate the proposed algorithm, a NIRRGB dataset is captured over real-world objects, where our NIR-assisted approach demonstrates superiority over RGB methods. Paper Learning to Minify Photometric Stereo Appeared in CVPR2019 Junxuan Li, Antonio Robles-Kelly, Shaodi You and Yasuyuki Matsushita Photometric stereo estimates the surface normal given a set of images acquired under different illumination conditions. To deal with diverse factors involved in the image formation process, recent photometric stereo methods demand a large number of images as input. We propose a method that can dramatically decrease the demands on the number of images by learning the most informative ones under different illumination conditions. To this end, we use a deep learning framework to automatically learn the critical illumination conditions required at input. Furthermore, we present an occlusion layer that can synthesize cast shadows, which effectively improves the estimation accuracy. We assess our method on challenging real-world conditions, where we outperform techniques elsewhere in the literature with a significantly reduced number of light conditions. Paper Single Image Water Hazard Detection using FCN with Reflection Attention Units Appeared in ECCV2018 Xiaofeng Han, Chuong Nyugen, Shaodi You and Jianfeng Lu Water bodies, such as puddles and ooded areas, on and off road pose signicant risks to autonomous cars. Detecting water from moving camera is a challenging task as water surface is highly refractive, and its appearance varies with viewing angle, surrounding scene, weather conditions. In this paper, we present a water puddle detection method based on a Fully Convolutional Network (FCN) with our newly proposed Reflection Attention Units (RAUs). An RAU is a deep network unit designed to embody the physics of reflection on water surface from sky and nearby scene. To verify the performance of our proposed method, we collect 11455 color stereo images with polarizers, and 985 of left images are annotated and divided into 2 datasets: On Road (ONR) dataset and Off Road (OFR) dataset. We show that FCN-8s with RAUs improves significantly precision and recall metrics as compared to FCN-8s, DeepLab V2 and Gaussian Mixture Model (GMM). We also show that focal loss function can improve the performance of FCN-8s network due to the extreme imbalance of water versus ground classication problem. Paper Webpage Adherent Raindrop Detection and Removal in Video presented in CVPR 2013, IEEE TPAMI 2016 Shaodi You, Robby T. Tan, Rei Kawakami, Yasuhiro Mukaigawa and Katsushi Ikeuchi Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Modeling, detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatio-temporal derivatives of raindrops. To accomplish the idea, we first model adherent raindrops using law of physics, and detect raindrops based on these models in combination with motion and intensity temporal derivatives of the input video. Having detected the raindrops, we remove them and restore the images based on an analysis that some areas of raindrops completely occludes the scene, and some other areas occlude only partially. For partially occluding areas, we restore them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity derivative. For completely occluding areas, we recover them by using a video completion technique. Experimental results using various real videos show the effectiveness of our method. Journal paper (12.7MB) Conference paper (2.5MB) Webpage Waterdrop Stereo Shaodi You, Robby T. Tan, Rei Kawakami, Yasuhiro Mukaigawa and Katsushi Ikeuchi This paper introduces depth estimation from water drops. The key idea is that a single water drop adhered to window glass is totally transparent and convex, and thus optically acts like a fisheye lens. If we have more than one water drop in a single image, then through each of them we can see the environment with different view points, similar to stereo. To realize this idea, we need to rectify every water drop imagery to make radially distorted planar surfaces look flat. For this rectification, we consider two physical properties of water drops: (1) A static water drop has constant volume, and its geometric convex shape is determined by the balance between the tension force and gravity. This implies that the 3D geometric shape can be obtained by minimizing the overall potential energy, which is the sum of the tension energy and the gravitational potential energy. (2) The imagery inside a water-drop is determined by the water-drop 3D shape and total reflection at the boundary. This total reflection generates a dark band commonly observed in any adherent water drops. Hence, once the 3D shape of water drops are recovered, we can rectify the water drop images through backward raytracing. Subsequently, we can compute depth using stereo. In addition to depth estimation, we can also apply image refocusing. Experiments on real images and a quantitative evaluation show the effectiveness of our proposed method. To our best knowledge, never before have adherent water drops been used to estimate depth. pdf (3.5MB) Haze Visibility Enhancement: A Survey and Quantitative Benchmarking Yu Li, Shaodi You, Michael S. Brown and Robby T. Tan Appeared in CVIU 2017 This paper provides a comprehensive survey of methods dealing with visibility enhancement of images taken in hazy or foggy scenes. The survey begins with discussing the optical models of atmospheric scattering media and image formation. This is followed by a survey of existing methods, which are grouped to multiple image methods, polarizing filters based methods, methods with known depth, and single-image methods. We also provide a benchmark of a number of well known single-image methods, based on a recent dataset provided by Fattal and our newly generated scattering media dataset that contains ground truth images for quantitative evaluation. To our knowledge, this is the first benchmark using numerical metrics to evaluate dehazing techniques. This benchmark allows us to objectively compare the results of existing methods and to better identify the strengths and limitations of each method. Paper Photo-Realistic Simulation of Road Scene for Data-Driven Methods in Bad Weather Kunming Li, Yu Li, Shaodi You and Nick Barnes Oral presentation in ICCV 2017 Workshop on Physcis Based Vision meets Deep Learning Modern data-driven computer vision algorithms require a large volume, varied data for validation or evaluation. We utilize computer graphics techniques to generate a large volume foggy image dataset of road scenes with different levels of fog. We compare with other popular synthesized datasets, including data collected both from the virtual world and the real world. In addition, we benchmark recent popular dehazing methods and evaluate their performance on different datasets, which provides us an objectively comparison of their limitations and strengths. To our knowledge, this is the first foggy and hazy dataset with large volume data which can be helpful for computer vision research in the autonomous driving. pdf (6.7MB) Webpage A Frequency Domain Neural Network for Fast Image Super-resolution, International Joint Conference on Neural Networks Appeared in IJCNN2018 Junxuan Li, Shaodi You and Antonio Robles-Kelly We present a frequency domain neural network for image super-resolution. The network employs the convolution theorem so as to cast convolutions in the spatial domain as products in the frequency domain. Moreover, the non-linearity in deep nets, often achieved by a rectifier unit, is here cast as a convolution in the frequency domain. This not only yields a network which is very computationally efficient at testing, but also one whose parameters can all be learnt accordingly. The network can be trained using back propagation and is devoid of complex numbers due to the use of the Hartley transform as an alternative to the Fourier transform. Moreover, the network is potentially applicable to other problems elsewhere in computer vision and image processing which are often cast in the frequency domain. We show results on super-resolution and compare against alternatives elsewhere in the literature. In our experiments, our network is one to two orders of magnitude faster than the alternatives with a marginal loss of performance. Paper Stereo Super-resolution via a Deep Convolutional Network Appeared in The International Conference on Digital Image Computing: Techniques and Applications, DICTA 2017 Junxuan Li, Shaodi You and Antonio Robles-Kelly We present a method for stereo superresolution which employs a deep network. The network is trained using the residual image so as to obtain a high resolution image from two, low resolution views. Our network is comprised by two deep sub-nets which share, at their output, a single convolutional layer. This last layer in the network delivers an estimate of the residual image which is then used, in combination with the left input frame of the stereo pair, to compute the super-resolved image at output. Each of these sub-networks is comprised by ten weight layers and, hence, allows our network to combine structural information in the image across image regions efficiently. Moreover, by learning the residual image, the network copes better with vanishing gradients and its devoid of gradient clipping operations. We illustrate the utility of our network for image-pair super-resolution and compare our network to its non-gradient trained analogue and alternatives elsewhere in the literature. Paper (7.4MB) Raindrop Detection and Removal from Long Range Trajectory. Oral presentation in ACCV 2014 Shaodi You, Robby T. Tan, Rei Kawakami, Yasuhiro Mukaigawa and Katsushi Ikeuchi In rainy scenes, visibility can be degraded by raindrops which have adhered to the windscreen or camera lens. In order to resolve this degradation, we propose a method that automatically detects and removes adherent raindrops. The idea is to use long range trajectories to discover the motion and appearance features of raindrops locally along the trajectories. These motion and appearance features are obtained through our analysis of the trajectory behavior when encountering raindrops. These features are then transformed into a labeling problem, which the cost function can be optimized efficiently. Having detected raindrops, the removal is achieved by utilizing patches indicated, enabling the motion consistency to be preserved. Our trajectory based video completion method not only removes the raindrops but also complete the motion field, which benefits motion estimation algorithms to possibly work in rainy scenes. Experimental results on real videos show the effectiveness of the proposed method. pdf(3.0MB) Webpage Identifying Surface BRDF from a Single 4D Light Field Image via Deep Neural Network Appeared in IEEE Journal on Selected Topics in Signal Processing Feng Lu, Lei He, Shaodi You, Zhixiang Hao Bidirectional reflectance distribution function (BRDF) defines how light is reflected at a surface patch to produce the surface appearance, and thus modeling/recognizing BRDFs is of great importance for various tasks in computer vision and graphics. However, such tasks are usually ill-posed or require heavy labor on image capture from different viewing angles. In this paper, we focus on the problem of remote BRDF type identification, by delivering novel techniques that capture and use a single light field image. The key is that a light field image captures both the spatial and angular information by a single shot, and the angular information enables effective samplings of the 4D BRDF. To implement the idea, we propose convolutional neural network (CNN) based architectures for BRDF identification from a single 4D light field image. Specifically, a StackNet and an Ang-convNet are introduced. The StackNet stacks the angular information of the light field images in an independent dimension, whereas the Ang-convNet uses angular filters to encode the angular information. In addition, we propose a large light field BRDF dataset containing 47650 high quality 4D light field image patches, with different 3D shapes, BRDFs and illuminations. Experimental results show significant accuracy improvement in BRDF identification by using the proposed methods. pdf (4.8Mb) Robust and Fast Motion Estimation for Video Completion. Oral presentation in IAPR MVA 2013 Shaodi You, Robby T. Tan, Rei Kawakami and Katsushi Ikeuchi A motion estimation method for completing a video with large and consecutive damage is introduced. It is principally based on sparse matching and interpolation. First, SIFT, which is robust to arbitrary motion, is used to efficiently obtain sparse correspondences in neighboring frames. To ensure these correspondences are uniformly distributed across the image, a fast dense point sampling method is applied. Then, a dense motion field is generated by interpolating the correspondences. An efficient weighted explicit polynomial fitting method is proposed to achieve spatially and temporally coherent interpolation. In the experiment, quantitative measurements were conducted to show the robustness and effectiveness of the proposed method. Paper (0.9MB) Video demo Slides Bibtex Semantic Single-Image Dehazing arXiv preprint Ziang Cheng, Shaodi You, Viorela Ila and Hongdong Li Single-image haze-removal is challenging due to limited information contained in one single image. Previous solutions largely rely on handcrafted priors to compensate for this deficiency. Recent convolutional neural network (CNN) models have been used to learn haze-related priors but they ultimately work as advanced image filters. In this paper we propose a novel semantic approach towards single image haze removal. Unlike existing methods, we infer color priors based on extracted semantic features.We argue that semantic context can be exploited to give informative cues for (a) learning color prior on clean image and (b) estimating ambient illumination. This design allowed our model to recover clean images from challenging cases with strong ambiguity, e.g. saturated illumination color and sky regions in image. In experiments, we validate our approach upon synthetic and real hazy images, where our method showed superior performance over state-of-the-art approaches, suggesting semantic information facilitates the haze removal task. Paper Webpage Water Splash Suppression in A Single Image arXiv preprint Shaodi You and Nick Barnes We propose a solution for a novel task in computer vision: water splash suppression in a single image. Water splash is a common natural phenomenon, that can significantly degrade the visibility of the scene behind it. This could be problematic for many computer vision tasks. For example, automatic driving in a wet-lane; life-guard surveillance of a pool; and motion analysis of a diver or swimmer. We consider solving the problem by suppressing the appearance of water splash and making objects behind it as clear as possible. We formulate the appearance of water splash as a layer overlapping with the background. The splash layer is further divided as blended splash and standalone splash. As the first work on this topic, we start with geometric and photometric of water splash. Based on this, we propose three priors: the first is the dark channel prior which is inspired by the dark channel prior in dehaze; the second is the global luminance prior; and the third is the standalone prior. Finally, we propose an iterative strategy to suppress water splash using the three priors. Experiments in various environments and scenes demonstrates the effectiveness of the proposed method. Paper Webpage