# 2. Images in MotionΒΆ

Imagine a horse walking, i.e. moving, in front of a (pinhole) camera. Also imagine that you could look at the back plane of the camera where the image is projected. Then you would see the moving horse. At every moment in time $$t$$ there is an image $$f(x,y)$$ projected on the backplane. To indicate the time dependence we can represent the ‘image in motion’ as a function in 3 arguments: $$f(x,y,t)$$.

Conceptually we think of an image as a 2D function with the continuous plane $$\setR^2$$ as its domain. Equivalently, an ‘image in motion’ is defined as a 3D function defined on the domain $$\setR^2\times\setR$$: both the spatial coordinates $$x,y$$ as well as the time ‘coordinate’ $$t$$ are continuous.

Evidently just like we need sampling to represent images with a finite amount of data, we also need to sample the time coordinate. A sampled ‘image in motion’ is most often called a video sequence. Each image in the sequence is called a frame.

Sampling images is possible because the human eye cannot resolve small details, i.e. when looking at a sampled image from a distance we don’t see the pixels anymore. The same is true for images in motion. If you present the human eye with a rapid sequence of images we cannot distinguish the individual frames anymore.

A video simply thus is nothing more then a lot of images displayed on the screen in a rapid sequence.

• Instead of considering the movement of all points in the image we are often interested only in the movement of an object depicted in the images in the sequence. This leads to motion tracking algorithms. For motion tracking we often assume that given the position and shape of an object at time $$t_0$$ the task is to find the object in the images for $$t>t_0$$.