Linear Operators
----------------
Definition
..........
A linear operator is an operator that distributes over multiplicative
weighted sums, i.e. let $f$ and $g$ be images and let $a$ and $b$ be
scalars then $\op L$ is a linear operator in case:
.. math::
\op L ( a f + b g ) = a \op L f + b \op L g
It is easy to prove that if the above holds, we can do the same for
$3,4,5,...$, any countable set of images (in fact linearity even
requires you can do it for any set of images where summation becomes
integration over a family of images).
Combining linearity with translation invariance leads to a unique
characterization of linear, translation invariant operators: the
**convolutions.** We start with the definition of correlations and
convolutions.
Correlations and Convolutions
.............................
Correlation is a local operator and is a simple local image
operator. A correlation takes an image $F$, a weight function $W$ and
it results in a new image $G$. The weight function $W$ is often
defined on a small subset of the sample points of $F$ (it is a small
image compared to $F$). Position the reference point of the weights at
location $\v i$, multiply the weights with the pixel values from image
$F$ and sum all these values. This weighted sum is the value in
location $\v i$ of a new image $G$. This is done for all pixels in the
image $F$ leading to a new image $G$.
Let the discrete function $W[k,l]$ define the weights. Assuming weights
different from zero in the range $-K\leq k \leq K$ and $-L\leq l \leq
L$ the correlation of image $F$ can then be written as:
.. math::
G[i,j] = \sum_{k=-K}^K \sum_{l=-L}^L \, F[i+k,j+l] W[k,l]
With the assumption that $W[k,l]=0$ for the pixels that are not to be
considered in the weighted sum we can run the summations over the
entire image domain. We then arrive at the definition of the
**correlation** $F\star_c W$ of two discrete functions $F$ and $W$:
.. math::
G(i,j) = (F\star_c W)[i,j] = \sum_{k=-\infty}^\infty
\sum_{l=-\infty}^\infty \, F[i+k,j+l] W[k,l]
For images of arbitrary dimensions using multindices we write:
.. math::
G[\v i] = (F\star_c W)[\v i] = \sum_{\v i} F[\v i + \v k] W[\v k]
The weight function $W$ is also called the *correlation mask* or the
*correlation kernel*.
For reasons that will become clear only later we most often work with
the convolution operator instead of the correlation operator. The only
difference between the two operators is a minus sign. The
**convolution** of the discrete function $F$ with kernel $W$ is defined
as:
.. math::
G[\v i] = (F\star W)[\v i] = \sum_{\v k} F[\v i - \v k] W[\v k]
The convolution of $F$ with $W$ thus is the correlation of $F$ with
the mirrored version of $W$. The recipe to calculate the convolution
of $F$ with $W$ thus is:
#. first mirror the kernel in the origin to give $W^m$ ($W^m[\v
k]=W[-\v k]$, note that this is a point mirroring),
#. move the kernel to position $\v i$
#. multiply weights with pixel values and sum all
#. assign sum to $G[\v i]$
The only difference with correlation thus is the first step that is
missing in the correlation. Said otherwise; correlation is convolution
with the mirrored kernel or convolution is correlation with the
mirrored kernel.
.. raw:: html
.. role:: red
.. container:: red
In :ref:`this section ` we show some
Python/Numpy/Scipy examples of convolutions. Please make sure you
understand these.
But then if these are almost equivalent why introduce the convolution?
The reason for this becomes clear when we discuss the properties of
the convolution.
First we show that the convolution is the unique linear translation
invariant operator in the following two sections. You can skip these
in case the 'why' question is less relevant for you.
Translation Invariance
......................
Let $f$ be an image, the *translated* image $\op T_{\v t} f$ is
defined as:
.. math::
\op (T_{\v t} f)(\v x) = f( \v x - \v t)
Any image operator $\op O$ is *translation invariant* in case:
.. math::
\op O ( \op T_{\v t} f ) = \op T_{\v t} ( \op O f )
or formally:
.. math::
\op O \op T_{\v t} = \op T_{\v t} \op O
i.e. a translation invariant operator commutes with thresholding, or
equivalently in words: the order of translation and the translation
invariant operator is irrelevant.
Linearity and Translation Invariance
....................................
Let $\op L$ be a linear operator that is translation invariant as
well, i.e. $\op L \op T_{\v t} = \op T_{\v t} \op L$. We will show
that these translation invariant linear operators are convolutions.
Any discrete function $F$ can be written as a weighted summation of
pulses. To that end we define the pulse function
.. math::
\Delta[\v i] = \begin{cases}
1 &: \v i = 0\\
0 &: \text{otherwise}
\end{cases}
Given the pulse function we have the following:
.. math::
F[\v i] = \sum_{\v j} F[\v j] \Delta[\v i - \v j] =
\sum_{\v j} F[\v j] (\op T_{\v j} \Delta)[\v i]
or equivalently:
.. math::
F = \sum_{\v j} F[\v j] \op T_{\v j} \Delta
where $\op T_{\v j} \Delta$ is the translated version of the
delta(pulse) function.
Next we apply a linear operator $\op L$ to $F$. This gives:
.. math::
\op L F = \op L \left( \sum_{\v j} F[\v j] \op T_{\v j} \Delta \right)
Due to the linearity we have:
.. math::
\op L F = \sum_{\v l} F[\v j] \op L \op T_{\v j} \Delta
and because of translation invariance:
.. math::
\op L F = \sum_{\v j} F[\v j] \op T_{\v j} \op L \Delta
or equivalently, defining $W=\op L \Delta$:
.. math::
(\op L F)(\v i) = \sum_{\v j} F[\v j] W[\v i - \v j] =
\sum_{\v j} F[\v i - \v j] W[\v j]
and therefore:
.. math::
\op L F = F \star \op L \Delta
showing that any translation invariant linear operator is a
convolution with kernel $\op L\Delta$ which function for obvious
reasons is called the *impulse response function*.
The Continuous Case
...................
Let $f$ and $w$ be two images (functions) defined on a continous
domain $\set D$. The convolution of $f$ and $w$ is denoted as $f\ast
w$ and is defined as:
.. math::
(f \ast w)(\v x) = \int_{\set D} f(\v x- \v y) w(\v y) d\v y
Doing image processing the right way, this is of course the starting
point when looking at linear operators. Images are functions defined
on a continuous domain and the samples (pixels) are a 'mathematical
nuisance'. Nevertheless we started this section with the introduction
of te discrete convolution because it is easier to understand. But please
remember that what we really like is something that approximates the
convolution integral.
Properties
..........
The convolution is **commutative**, i.e. there is no theoretical
distinction between the two functions involved.
.. math::
f \ast g = g \ast f \qquad F \star G = G \star F
The convolution is **associative**:
.. math::
(f \ast g ) \ast h = f \ast (g \ast h) \qquad
(F \star G ) \star H = F \star (G \star H)
Especcially this last property has important practical
implications. It allows us to decompose a large kernel $g\ast h$ into
two smaller ones ($g$ and $h$). Convolving an image with the larger
can then be done by a sequence of two convolutions using smaller
kernels.
Consider the kernel:$\newcommand{\nth}{\tfrac{1}{9}}$
.. math::
W = \begin{Bmatrix}
\nth & \nth & \nth\\
\nth & \underline{\nth} & \nth\\
\nth & \nth & \nth
\end{Bmatrix}
= \nth \begin{Bmatrix}
1 & 1 & 1\\
1 & \underline{1} & 1\\
1 & 1 & 1
\end{Bmatrix}
The above tabulation of a two dimensional discrete function is based
on the convention that points with value $0$ are not shown. Also by
convention we underline the origin (central pixel). A convolution with
the above kernel thus calculates the average values in all $3\times3$
neighborhoods in the image.
If we convolve twice in succession with the above kernel is equal to
the convolution using a kernel $W \star W$:
.. math::
W \star W = \tfrac{1}{81} \begin{Bmatrix}
1 & 2 & 3 & 2 & 1\\
2 & 4 & 6 & 4 & 2\\
3 & 6 & \underline{9} & 6 & 3\\
2 & 4 & 6 & 4 & 2\\
1 & 2 & 3 & 2 & 1
\end{Bmatrix}
Already in this simple case we see that implementing the convolution
with the large $5\times5$ kernel (25 multiplications/addition) can be
done more efficient by convolving twice with a smaller kernel (twice 9
multiplications/additions).
Both the above properties are equally valid for images defined on a
continuous domain and images defined on a discrete domain (sampled
images). The proof in the discrete domain is most often easier. Note
carefully that in proofs you will often use some change of variables
whose validity depends on the assumption of an infinite domain. Indeed
most properties of convolutions (and also morphological operators) are
dependent on the infinite domain combined with translation
invariance. In practice we never have an infinite domain and therefore
translation invariance is only approximately true. And this makes the
above properties also 'approximately' true or only true in special
cases. Be careful!
From Convolution Integrals to Convolution Sums
..............................................
Let $F$ be the sampled version of some image function $f$, i.e. $F=\op
S_B f$. We would like to calculate a sampled version of $f\ast
w$. Only in case the function $f$ was perfectly bandlimited we can
reconstruct $f$ from its samples $F$ by (sinc) interpolation. In most
cases we can only reconstruct $f$ approximately using interpolation:
.. math::
\hat f = \op I_B F = \sum_{\v k} F[\v k] \op T_{B\v k} \phi
where $\op I_B$ is some linear interpolator with respect to the
sampling grid generated by the column vectors in $B$. The convolution
of $\hat f$ and a kernel $w$ equals:
.. math::
\hat f \ast w &= \left( \sum_{\v k} F[\v k] \op T_{B\v k} \phi
\right) \ast w \\
&= \sum_{\v k} F[\v k] (\op T_{B\v k} \phi) \ast w\\
&= \sum_{\v k} F[\v k] \op T_{B\v k} (\phi \ast w)
The approximation $\hat f$ is build as a weighted summation of
translated versions of the 'interpolated kernel' $\phi\ast w$. Note
that we can calculate $\hat f$ at any location $\v x$ in the continuous
domain. In case we sample $g = \hat f \ast w$ using the same grid, i.e. in the
points $B\v i$ we get:
.. math::
G[\v i] = g(B\v i) = (\hat f \ast w)(B\v i) =
\sum_{\v k} F[\v k] \op T_{B\v k} (\phi \ast w)(B\v i)
Writing $w_\phi$ for $\phi\ast w$ and $W_\phi[\v i]=w_\phi(B\v i)$:
.. math::
G[\v i] &= \sum_{\v k} F[\v k] \op T_{B\v k} w_\phi(B\v i)\\
&= \sum_{\v k} F[\v k] w_\phi(B\v i-B\v k)\\
&= \sum_{\v k} F[\v k] W_\phi[\v i-\v k]\\
&= (F \star W_\phi)[\v i]
Knowing that $G$ is a aproximation of the sampling of $f
\ast w$ we may write:
.. math::
\op S_B\left( f \ast w \right) \approx
\op S_B f \star \op S_B\left( \phi \ast w \right)
Most of the image processing textbooks just state that a convolution
integral in the continuous domain corresponds with a convolution sum
in the discrete domain and that we just have to sample both the image
and the kernel. This indeed is true in case $\op S_B(w)=\op
S_B(\phi\ast w)$ which is true in many important situations in
practice.
Looking at the problem from a signal processing point of view we
assume our image $f$ to be bandlimited (or at least pre filtered to
make it bandlimited before sampling). The corresponding ideal
interpolation kernel is the sinc kernel (a perfect low pass
filter). Then if $w$ is bandlimited as well, we have $w=w\ast\phi$ and
in this case we have that $\op S_B\left( f \ast w \right) = \op S_B f
\star \op S_B w$. Note that your kernel should be bandlimited as well.
.. Algorithms
..........
The simplest algorithm for convolution is a straightforward
implementation of the definition.