(Laplacian) Scale Space ======================= .. figure:: /images/blobs2d.png :align: right :figwidth: 50% **Two Gaussian Blobs.** The crux of the scale invariant feature transform is that Gaussian blobs in an image show up as extreme in the scale normalized Laplacian scale space $\ell(\v x,s)$ (see the section on Linear Scale Space): .. math:: \ell(\v x, s) = s^2 (f \ast (G^s_{xx} + G^s_{yy}))(\v x) If we have an image with a Gaussian blob of scale $s_b$ at position $\v x_0$ it is not hard to prove that we find an extremum in $\ell(\v x, s)$ at $\v x=\v x_0$ and $s=s_b$. This is true for an isolated blob. For blobs closer together or images that show blob like structures it remains true that we find extrema in the Laplacian scale space at the scale of the blob and at its position in space. Note that at large enough scale (i.e. after convolving with large scale Gaussian filters) almost everything in an image will start to look like Gaussian blobs. Below we give the proof (click on the bar to show the proof) for an isolated blob. Even for that simple case the proof is not difficult but quite lengthy and so we use Mathematica...). .. proof:proof:: Blob in Scale-Space We define the image $f$ showing a blob at position $(0,0)$ with scale $s_b$ as the Gaussian function .. math:: f(x, y) = G^{s_b}(x, y) = \frac{1}{s_b\sqrt{2\pi}} e^{-\frac{x^2+y^2}{2s_b^2}} In the Laplacian scale space we get .. math:: \ell(x, y, s) &= s^2\left( \pfrac{^2G^\sqrt{s^2+s_b^2}(x,y)}{x^2} + \pfrac{^2G^\sqrt{s^2+s_b^2}(x,y)}{y^2} \right)\\ &= s^2\left( G^\sqrt{s^2+s_b^2}_{xx}(x,y) + G^\sqrt{s^2+s_b^2}_{yy}(x,y) \right) The extrema in $\ell(x,y,s)$ are found by solving the equations $\ell_x(x,y,s)=0$, $\ell_y(x,y,s)=0$ and $\ell_s(x,y,s)=0$ for $x$, $y$ and $s$. This is a typical proof that I like to use a symbolic math program for. Below you find the Mathematica notebook for this: .. image:: /images/blobproof.png Yes indeed we have cheated a little here as we have assumed that the extremum is at $x=0$, $y=0$. Only assuming that $x$ and $y$ are real values we not only get the extremum at the origin but also a continuum of saddle points (see the Mathematical Tools part chapter on least squares estimators), i.e. points where all three derivatives are equal zero but the eigenvalues of the Hessian matrix are of mixed sign. Calculating Scale-Space ~~~~~~~~~~~~~~~~~~~~~~~ Given the Gaussian derivative convolution encoded in the Python function ``gD`` A linear scale space is nothing else than a stack of images that blurred with increasingly larger Gaussian kernels. Let $f^{s_0}$ be the starting image whose scale is assumed to be $s_0$ (where it is often assumed that $s_0=0.5$). From this image we then build the scale space: .. math:: f(\v x, s) = (f^{s_0} \ast G^{\sqrt{s^2-s_0^2}})(\v x) Because Gaussian smoothing commutes with calculating the derivative (any derivative) we have .. math:: \partial_{..} f(\v x, s) = (\partial_{..} f^{s_0} \ast G^\sqrt{s^2-s_0^2})(\v x) where $\partial_{..}$ stands for any spatial derivative. Evidently, as seen before, the more sensible way to implement this is: .. math:: \partial_{..} f(\v x, s) = (f^{s_0} \ast \partial_{..} G^\sqrt{s^2-s_0^2})(\v x) Scale should be sampled logarithmically, i.e. an equidistant sampling in the $\log(s)$ parameter. .. math:: s_i = s_0 \alpha^i and thus: .. math:: f(\v x, s_i) &= (f^{s_0} \ast G^\sqrt{s_i^2-s_0^2})(\v x) \\ &= (f^{s_0} \ast G^\sqrt{s_0^2\; \alpha^{2i}-s_0^2})(\v x) \\ &= (f^{s_0} \ast G^{s_0 \sqrt{ \alpha^{2i}-1}})(\v x) \\ In the scale invariant feature transform SIFT the keypoint are found as the extrema in the scale normalized Laplacian scale space: .. math:: \ell(\v x, s) = s^2 (f \ast (G^s_{xx} + G^s_{yy}))(\v x) This idea was pionered (and invented?) by Toni Lindenberg. It is easy to calculate that in case an image contains a Gaussian blob at scale $s_b$ there is an extremum in the Laplacian scale space at the center of the blob and at scale $s_b$. All blobs when smoothed with a Gaussian kernel are 'lookin like' Gaussian blobs and thus extrema in the Laplacian scale space are points that we may hope for to find in images in a translation, rotationally and scale invariant way. Finding the extrema is discussed in a subsequent section. When we also want to find the extrema at subpixel accuracy we need to calculate the first and second order derivatives of $\ell$ both spatial as well as scale derivatives. These corresponds with derivatives of $f$ up to order four. The original implementation of the SIFT algorithm approximates the Laplacian scale space as a difference of Gaussians (DoG) stack. Let $f$ be the Gaussian (zero order) scale space then it is not hard to prove that .. math:: \ell(\v x, s) \approx \frac{1}{\alpha - 1}\left( f(\v x, \alpha s) - f(\v x, s)\right) .. raw:: html