Scale Space
===========


Most of the images we (or computers) look at are taken with optical
camera's projecting our 3D world onto a 2D light sensitive plane (the
retina for human eyes or the CCD or CMOS pixel plane for camera's). We
are all familiar with the fact that distant objects are projected at
smaller size. The human visual system thus has little a priori
knowledge about the scale/size an object will appear on the image.

.. _fig_sunflowers:

.. figure:: /images/sunflowers.jpg
   :width: 400
   :align: center

   Sunflowers. The size (scale) of the objects is dependent on the
   distance from the camera. A visual system should be prepared to
   'see' objects at all possible sizes.

The image in figure :ref:`Sunflowers <fig_sunflowers>` is a nice
illustration of this effect. The flowers in the front are easily 10
times larger then the flowers near the horizon. Instead of flowers we
could equally well look at faces at different distances (and thus
size) or at license plates of cars at different distances from the
camera.

The only sensible thing to do then is to be prepared for all possible
sizes. The image should be processed at all levels of scales
simultaneously. In a previous chapter we have already discussed the
local structure of images and described that the scale of local
details is characterized with the scale of the Gaussian smoothing
kernel with which the image is smoothed.

Smoothing an image at all possible scales $s>0$ leads to what is
called a **scale-space.**


Definition and Properties
-------------------------

It may seem that there are enumerous ways to construct a
scale-space. But surprisingly in case we set some simple requirements
there is only one scale-space construction possible.

We are looking for a family of image operators $\op L^s$ where $s$
refers to the scale such that given the 'image at zero scale' $f^0$ we
can construct the family of images $\op L^s f^0$ such that:

#. $\op L^s$ is a *linear* operator,
#. $\op L^s$ is *translation invariant* (all parts of the image are
   treated equally, i.e. there is no a priori preferred location),
#. $\op L^s$ is *rotational invariant* (all directions are treated
   equally, i.e. there is no a priori preferred direction),
#. $\op L^s$ is *separable by dimension*, and
#. $\op L^s$ is invariant under change of scale (starting with a scaled
   version of the zero scale image we should in principal obtain the
   same scale-space but then offsetted in scale).


.. sidebar:: **Jan Koenderink**

   .. image:: /images/koenderink_scsp.png
      :height: 300

   Prof. Jan Koenderink from Utrecht University formulated scale-space
   theory in the 1980's. The seminal paper is: J.J. Koenderink, *The
   Structure of Images*, Biological Cybernetics, 1984. From the abstract:

   .. epigraph::

      In practice the relevant details of images exist only over a
      restricted range of scale. Hence it is important to study the
      dependence of image structure on the level of resolution. It
      seems clear enough that visual perception treats images on
      several levels of resolution


It can be proved that the unique scale space operator
satisfying all the requirements is the Gaussian convolution:

.. math::

   f(\v x, s) = (f^0 \ast G^s)(\v x)
   	      
where $G^s$ is the 2D Gaussian kernel:

.. math::
   
   G^s(\v x) = \frac{1}{2\pi s^2} \exp\left(-\frac{\|\v x\|^2}{2 s^2}\right)

The function $f$ can be interpreted as a one parameter family of
images. For every scale $s>0$ there is an image $f(\cdot,s)$ that we
will refer to frequently as $f^s$.

The scale-space function $f$ is the solution of the diffusion equation:

.. math::

   f_s = s \, \nabla^2 f
       = s (f_{xx}+f_{yy})

with initial condition $f(\cdot,0)=f^0$. In fact this differential
equation was the starting point for Koenderink in deriving linear
scale space (see :ref:`here <scalespaceCausality>`). From a practical
point of view it is an important equation because it tells us that the
change in the function $f$ due to a small change in scale (going from
scale $s$ to scale $s+ds$) is proportional to the Laplacian
differential structure of $f$ at scale $s$. It shows that scale
derivatives may always be exchanged for spatial derivatives.

The Gaussian convolution forms a semi-group in the sense that:

.. math::
   G^s \ast G^t = G^\sqrt{s^2+t^2}

This implies that first convolving with a Gaussian at scale $s$
followed by convolving with a Gaussian at scale $t$ is equivalent with
one convolution at scale $\sqrt{s^2+t^2}$. 


Scale-Space in Practice
-----------------------

Every slice out of a scale-space can be obtained from the 'zero-scale'
image by a Gaussian convolution: $f(\cdot,s) = f_0 \ast G^s$. But what
comes out of the camera can't be the zero scale image. An image at
scale $s=0$ is a mathematical abstraction, observing an image *always*
is at some finite scale (the size of the measurement probe).

The image that comes from the camera often serves as the first image
in scale space and assuming $\Delta x=\Delta y = 1$ a zero scale of
$s_0=\half$ is a common choice (as it results in slightly overlapping
probes and thus some smoothness when considering neighboring samples).

To construct the scale space starting at the scale of the observed
image ($s_0$) we have (for $s>s_0$):

.. math::
   f(\v x, s) = f^0 \ast G^s = f^{s_0} \ast G^{\sqrt{s^2-s_0^2}}


Sampling Scale Space
--------------------

.. sidebar:: **Scale-space Sampling**
   
   .. image:: /images/scalespace_sampling.png
      :align: center
      :width: 250 

   For one slice (the 'zero-scale' slice) all the spatial samples are
   shown, and for one spatial sample (pixel) all the scale samples are
   shown.

Every 'slice' $f(\cdot,s)$ out of the scale-space is an image and the
*equidistant sampling* is used for all images:

.. math::
   x_i = i\,\Delta x, \quad y_j = j\,\Delta y

for $i=0,\ldots,n_x-1$ and $j=0,\ldots,n_y-1$.


Scale has to be sampled *logarithmically*:

.. math::
   s_k = \alpha^k \, s_0

for $k=0,\ldots,n_s-1$ (thus $\log s$ is sampled equidistantly). The
need for logarithmic scale sampling becomes clear if you compare
$f\ast G^1$ with $f\ast G^2$ and also compare $f\ast G^{11}$ with $f\ast
G^{12}$. For both pairs the absolute increase in scale is the same,
however the visual change in smoothing from $s=1$ to $s=2$ is much
more profound than the change in smoothing from $s=11$ to $s=12$.

The consequence of the semi-group property of the Gaussian convolution
is that a scale space can be build incrementally. We don't have to
calculate $f^{s_k}$ by convolving $f^{s_0}$ we can convolve a
previous slice in scale space to obtain the next one:

.. math::

   f^{s_k} = f^{s_{k-1}} \ast G^{\sqrt{s_k^2-s_{k-1}^2}}
   = f^{s_k} \ast G^{s_0 \alpha^{k-1}\sqrt{\alpha^2-1}}

Note that even for the incremental build of the scale space the scale
of the Gaussian kernel grows as the scale increases. Burt's pyramid
discussed in the next section is a way to work with a fixed scale
Gaussian kernel.


..
    Burt's Pyramid
    --------------

    In our definition of scale spaces we have assumed that the spatial
    resolution is equal for all scales. That is not necessary of
    course. The Gaussian convolution is a low-pass filter and thus high
    frequencies are effectively removed. According to the samplings
    theorem of Shannon-Nyquist we may then subsample the image without
    loss of information. Because the Gaussian filter is not a perfect
    low-pass filter there will be some loss but in practice it is
    negligible.

    Let our first image $f^{s_0}$ be observed at scale $s_0$ and let us
    assume that sampling was properly done. Then if we double the
    observation scale to $2 s_0$ we may subsample the image with a factor
    2 in both dimensions. In case we quadruple the scale we may subsample
    with a factor 4, etc. Subsampling gives an enormous reduction in
    number of samples and therefore will speed up image processing
    significantly.

    Subsampling an image, meaning that the sample distances are enlarged,
    is equivalent with scaling the image and then sample with the original
    sample distances. Let $\op C_a$ be the scaling operator: $(\op C_a
    f)(\v x)=f(\v x / a)$ then $\op C_{1/2} f$ reduces the size of the image
    with a factor 2 in both directions. Sampling $\op C_{1/2} f$ with respect
    to the standard sampling grid then effectively subsamples the image.

    It is not difficult to prove that:

    .. math::
    \op C_a (f \ast G^s) &= \op C_a f \ast \op C_a G^s \\
    &= \op C_a f \ast G^{a\,s}

    showing that when we have reduced the image size (and possibly
    subsampled) the scale of the Gaussian kernel has to be changed as
    well. 


    Scale Space and Beyond
    ----------------------