5.2. Linear Operators¶

5.2.1. Definition¶

A linear operator is an operator that distributes over multiplicative weighted sums, i.e. let \(f\) and \(g\) be images and let \(a\) and \(b\) be scalars then \(\op L\) is a linear operator in case:

\[\op L ( a f + b g ) = a \op L f + b \op L g\]

It is easy to prove that if the above holds, we can do the same for \(3,4,5,...\), any countable set of images (in fact linearity even requires you can do it for any set of images where summation becomes integration over a family of images).

Combining linearity with translation invariance leads to a unique characterization of linear, translation invariant operators: the convolutions. We start with the definition of correlations and convolutions.

5.2.2. Correlations and Convolutions¶

Correlation is a local operator and is a simple local image operator. A correlation takes an image \(F\), a weight function \(W\) and it results in a new image \(G\). The weight function \(W\) is often defined on a small subset of the sample points of \(F\) (it is a small image compared to \(F\)). Position the reference point of the weights at location \(\v i\), multiply the weights with the pixel values from image \(F\) and sum all these values. This weighted sum is the value in location \(\v i\) of a new image \(G\). This is done for all pixels in the image \(F\) leading to a new image \(G\).

Let the discrete function \(W[k,l]\) define the weights. Assuming weights different from zero in the range \(-K\leq k \leq K\) and \(-L\leq l \leq L\) the correlation of image \(F\) can then be written as:

\[G[i,j] = \sum_{k=-K}^K \sum_{l=-L}^L \, F[i+k,j+l] W[k,l]\]

With the assumption that \(W[k,l]=0\) for the pixels that are not to be considered in the weighted sum we can run the summations over the entire image domain. We then arrive at the definition of the correlation \(F\star_c W\) of two discrete functions \(F\) and \(W\):

\[G(i,j) = (F\star_c W)[i,j] = \sum_{k=-\infty}^\infty \sum_{l=-\infty}^\infty \, F[i+k,j+l] W[k,l]\]

For images of arbitrary dimensions using multindices we write:

\[G[\v i] = (F\star_c W)[\v i] = \sum_{\v i} F[\v i + \v k] W[\v k]\]

The weight function \(W\) is also called the correlation mask or the correlation kernel.

For reasons that will become clear only later we most often work with the convolution operator instead of the correlation operator. The only difference between the two operators is a minus sign. The convolution of the discrete function \(F\) with kernel \(W\) is defined as:

\[G[\v i] = (F\star W)[\v i] = \sum_{\v k} F[\v i - \v k] W[\v k]\]

The convolution of \(F\) with \(W\) thus is the correlation of \(F\) with the mirrored version of \(W\). The recipe to calculate the convolution of \(F\) with \(W\) thus is:

first mirror the kernel in the origin to give \(W^m\) (\(W^m[\v k]=W[-\v k]\), note that this is a point mirroring),
move the kernel to position \(\v i\)
multiply weights with pixel values and sum all
assign sum to \(G[\v i]\)

The only difference with correlation thus is the first step that is missing in the correlation. Said otherwise; correlation is convolution with the mirrored kernel or convolution is correlation with the mirrored kernel.

In this section we show some Python/Numpy/Scipy examples of convolutions. Please make sure you understand these.

But then if these are almost equivalent why introduce the convolution? The reason for this becomes clear when we discuss the properties of the convolution.

First we show that the convolution is the unique linear translation invariant operator in the following two sections. You can skip these in case the ‘why’ question is less relevant for you.

5.2.3. Translation Invariance¶

Let \(f\) be an image, the translated image \(\op T_{\v t} f\) is defined as:

\[\op (T_{\v t} f)(\v x) = f( \v x - \v t)\]

Any image operator \(\op O\) is translation invariant in case:

\[\op O ( \op T_{\v t} f ) = \op T_{\v t} ( \op O f )\]

or formally:

\[\op O \op T_{\v t} = \op T_{\v t} \op O\]

i.e. a translation invariant operator commutes with thresholding, or equivalently in words: the order of translation and the translation invariant operator is irrelevant.

5.2.4. Linearity and Translation Invariance¶

Let \(\op L\) be a linear operator that is translation invariant as well, i.e. \(\op L \op T_{\v t} = \op T_{\v t} \op L\). We will show that these translation invariant linear operators are convolutions.

Any discrete function \(F\) can be written as a weighted summation of pulses. To that end we define the pulse function

\[\begin{split}\Delta[\v i] = \begin{cases} 1 &: \v i = 0\\ 0 &: \text{otherwise} \end{cases}\end{split}\]

Given the pulse function we have the following:

\[F[\v i] = \sum_{\v j} F[\v j] \Delta[\v i - \v j] = \sum_{\v j} F[\v j] (\op T_{\v j} \Delta)[\v i]\]

or equivalently:

\[F = \sum_{\v j} F[\v j] \op T_{\v j} \Delta\]

where \(\op T_{\v j} \Delta\) is the translated version of the delta(pulse) function.

Next we apply a linear operator \(\op L\) to \(F\). This gives:

\[\op L F = \op L \left( \sum_{\v j} F[\v j] \op T_{\v j} \Delta \right)\]

Due to the linearity we have:

\[\op L F = \sum_{\v l} F[\v j] \op L \op T_{\v j} \Delta\]

and because of translation invariance:

\[\op L F = \sum_{\v j} F[\v j] \op T_{\v j} \op L \Delta\]

or equivalently, defining \(W=\op L \Delta\):

\[(\op L F)(\v i) = \sum_{\v j} F[\v j] W[\v i - \v j] = \sum_{\v j} F[\v i - \v j] W[\v j]\]

and therefore:

\[\op L F = F \star \op L \Delta\]

showing that any translation invariant linear operator is a convolution with kernel \(\op L\Delta\) which function for obvious reasons is called the impulse response function.

5.2.5. The Continuous Case¶

Let \(f\) and \(w\) be two images (functions) defined on a continous domain \(\set D\). The convolution of \(f\) and \(w\) is denoted as \(f\ast w\) and is defined as:

\[(f \ast w)(\v x) = \int_{\set D} f(\v x- \v y) w(\v y) d\v y\]

Doing image processing the right way, this is of course the starting point when looking at linear operators. Images are functions defined on a continuous domain and the samples (pixels) are a ‘mathematical nuisance’. Nevertheless we started this section with the introduction of te discrete convolution because it is easier to understand. But please remember that what we really like is something that approximates the convolution integral.

5.2.6. Properties¶

The convolution is commutative, i.e. there is no theoretical distinction between the two functions involved.

\[f \ast g = g \ast f \qquad F \star G = G \star F\]

The convolution is associative:

\[(f \ast g ) \ast h = f \ast (g \ast h) \qquad (F \star G ) \star H = F \star (G \star H)\]

Especcially this last property has important practical implications. It allows us to decompose a large kernel \(g\ast h\) into two smaller ones (\(g\) and \(h\)). Convolving an image with the larger can then be done by a sequence of two convolutions using smaller kernels.

Consider the kernel:\(\newcommand{\nth}{\tfrac{1}{9}}\)

\[\begin{split}W = \begin{Bmatrix} \nth & \nth & \nth\\ \nth & \underline{\nth} & \nth\\ \nth & \nth & \nth \end{Bmatrix} = \nth \begin{Bmatrix} 1 & 1 & 1\\ 1 & \underline{1} & 1\\ 1 & 1 & 1 \end{Bmatrix}\end{split}\]

The above tabulation of a two dimensional discrete function is based on the convention that points with value \(0\) are not shown. Also by convention we underline the origin (central pixel). A convolution with the above kernel thus calculates the average values in all \(3\times3\) neighborhoods in the image.

If we convolve twice in succession with the above kernel is equal to the convolution using a kernel \(W \star W\):

\[\begin{split}W \star W = \tfrac{1}{81} \begin{Bmatrix} 1 & 2 & 3 & 2 & 1\\ 2 & 4 & 6 & 4 & 2\\ 3 & 6 & \underline{9} & 6 & 3\\ 2 & 4 & 6 & 4 & 2\\ 1 & 2 & 3 & 2 & 1 \end{Bmatrix}\end{split}\]

Already in this simple case we see that implementing the convolution with the large \(5\times5\) kernel (25 multiplications/addition) can be done more efficient by convolving twice with a smaller kernel (twice 9 multiplications/additions).

Both the above properties are equally valid for images defined on a continuous domain and images defined on a discrete domain (sampled images). The proof in the discrete domain is most often easier. Note carefully that in proofs you will often use some change of variables whose validity depends on the assumption of an infinite domain. Indeed most properties of convolutions (and also morphological operators) are dependent on the infinite domain combined with translation invariance. In practice we never have an infinite domain and therefore translation invariance is only approximately true. And this makes the above properties also ‘approximately’ true or only true in special cases. Be careful!

5.2.7. From Convolution Integrals to Convolution Sums¶

Let \(F\) be the sampled version of some image function \(f\), i.e. \(F=\op S_B f\). We would like to calculate a sampled version of \(f\ast w\). Only in case the function \(f\) was perfectly bandlimited we can reconstruct \(f\) from its samples \(F\) by (sinc) interpolation. In most cases we can only reconstruct \(f\) approximately using interpolation:

\[\hat f = \op I_B F = \sum_{\v k} F[\v k] \op T_{B\v k} \phi\]

where \(\op I_B\) is some linear interpolator with respect to the sampling grid generated by the column vectors in \(B\). The convolution of \(\hat f\) and a kernel \(w\) equals:

\[\begin{split}\hat f \ast w &= \left( \sum_{\v k} F[\v k] \op T_{B\v k} \phi \right) \ast w \\ &= \sum_{\v k} F[\v k] (\op T_{B\v k} \phi) \ast w\\ &= \sum_{\v k} F[\v k] \op T_{B\v k} (\phi \ast w)\end{split}\]

The approximation \(\hat f\) is build as a weighted summation of translated versions of the ‘interpolated kernel’ \(\phi\ast w\). Note that we can calculate \(\hat f\) at any location \(\v x\) in the continuous domain. In case we sample \(g = \hat f \ast w\) using the same grid, i.e. in the points \(B\v i\) we get:

\[G[\v i] = g(B\v i) = (\hat f \ast w)(B\v i) = \sum_{\v k} F[\v k] \op T_{B\v k} (\phi \ast w)(B\v i)\]

Writing \(w_\phi\) for \(\phi\ast w\) and \(W_\phi[\v i]=w_\phi(B\v i)\):

\[\begin{split}G[\v i] &= \sum_{\v k} F[\v k] \op T_{B\v k} w_\phi(B\v i)\\ &= \sum_{\v k} F[\v k] w_\phi(B\v i-B\v k)\\ &= \sum_{\v k} F[\v k] W_\phi[\v i-\v k]\\ &= (F \star W_\phi)[\v i]\end{split}\]

Knowing that \(G\) is a aproximation of the sampling of \(f \ast w\) we may write:

\[\op S_B\left( f \ast w \right) \approx \op S_B f \star \op S_B\left( \phi \ast w \right)\]

Most of the image processing textbooks just state that a convolution integral in the continuous domain corresponds with a convolution sum in the discrete domain and that we just have to sample both the image and the kernel. This indeed is true in case \(\op S_B(w)=\op S_B(\phi\ast w)\) which is true in many important situations in practice.

Looking at the problem from a signal processing point of view we assume our image \(f\) to be bandlimited (or at least pre filtered to make it bandlimited before sampling). The corresponding ideal interpolation kernel is the sinc kernel (a perfect low pass filter). Then if \(w\) is bandlimited as well, we have \(w=w\ast\phi\) and in this case we have that \(\op S_B\left( f \ast w \right) = \op S_B f \star \op S_B w\). Note that your kernel should be bandlimited as well.