3.2. Motion Tracking
Suppose that in one video frame we can outline the (tennis) ball and we want to track the position of the ball in subsequent frames. For a (white) ball this seems an easy task: from whatever angle you look at it, in the image it will look like a circular disk. But in reality that doesn’t need to be true. Lighting conditions may change (e.g. the ball moves from an area with bright sunlight to an area in the shadow), the ball may be obscured by some object in front of it, the ball is not evenly colored and is spinning in the air showing different colors in each frame.
Now consider the situation where you would like to track humans moving through a scene. Humans may stand in front of each other (and partly obscure eachother), humans may turn around their vertical axis and thus change their shape as projected on the image plane, humans may move their arms and legs. Their are an almost infinite amount of ways that a human shape (even when walking) can appear in an image.
So given all the variation in the appearance of an object, how can we reliably track these objects in a video sequence?
Motion tracking is an important problem in computer vision and many researchers proposed motion tracking algorithms. As far as i know of, there is still no motion tracker that is capable of solving all tracking problems.
Some of the basic motion tracking approaches are:
Blob tracking using background estimation. In case the camera is fixed in position and camera parameters there is no ‘ego-motion’ (i.e. motion in the images induced by the moving camera). In such cases it might be possible to use automatic background substraction methods to find the moving objects. Tracking objects then can be as simple as finding which blobs (connected components) overlap in subsequent frames. For larger movements where the same object in subsequent frames do not overlap one might use a prediction filter to track the objects.
Correlation Tracking. Assume that at time \(t_0\) the object to be tracked is outlined with a rectangle in the image. Then at time \(t>t_0\) we use the outlined object (a small image) and search for the position in the image at time \(t\) where the image looks the same as the image patch representing the object. A well known matching measure is the normalized cross correlation.
Histogram Tracking. Instead of looking at the spatial color (gray) value distribution in the image patch it is also possible to forget about the spatial distribution and only consider a histogram of the colors within the image patch to be tracked. The mean shift tracker is the well known example of a tracker in this class. An advantage of this tracker is that it is large invariant under shape transformations.
CNN Based Tracking.