5. Homogeneous Coordinates
The matrix
rotates a point \((x\; y)\T\) over an angle \(\phi\). The matrix
mirrors a point \((x\; y)\T\) in the vertical (\(y\)) axis. Mirroring and rotation are linear operators that can be represented with a matrix.
Can we also describe a translation with a matrix? The answer is: no, that is not if you kling to the description of a point in 2 dimensional space with a 2 element vector. A matrix describes a linear transformation and therefore the origin should be mapped onto the origin. For a translation this is evidently not true and thus a \(2\times2\) matrix cannot describe translations in a 2D plane.
5.1. Homogeneous coordinates in 2D space
Projective geometry in 2D deals with the geometrical transformation that preserve collinearity of points, i.e. given three points on a line these three points are transformed in such a way that they remain collinear. The line may change but the transformed points are again on a line. A translation is a collinearity preserving transformation but it is not a linear transformation and hence there is no \(2\times2\) matrix representing the translation.
The ‘tric’ is to make a 3 component vector out of 2 component vector. Let \((x\;y)\T\) be a vector then we define a homogeneous vector \((sx\;sy\;s)\T\) where \(s\not=0\). When making a homogeneous vector out of a classical vector we often take \(s=1\) meaning that we simply add an extra element equal 1 to the vector. Let \(\v x\) be a vector in \(\setR^2\) then its homogeneous representation is the vector
Given a homogeneous vector \(\hv x\) we may ask what the corresponding 2D vector \(\v x\) is: simple divide by the last element in \(\hv x\) and take the first two elements:
In case \(c=0\) we end up with a point at infinity.
To indicate that two homogeneous vectors \(\hv x\) and \(\hv y\) correspond with the same 2D vector mathematical equality is too strict a demand. The two homogeneous vectors represent the same 2D vector in case there is a scaling factor \(s\not=0\) such that \(s\hv x = \hv y\). We will denote this with:
5.1.1. Rigid Body Transformation
If we now look again at the translation of a vector \((x; y)\T\) over a vector \((t_1\;t_2)\T\) it is easy to prove that this can be written as the matrix vector product:
and the rotation over an angle \(\phi\) can be written as:
Combinations of rotations and translations are called rigid body transformations (for obvious reasons). The generic rigid body transformation can be written as:
The rigid body transformations preserve (relative) angles between vectors and they preserve length of vectors. Because angles are preserved, parallelness of lines is preserved.
Note that for a rigid body transformation the resulting scale factor (the third element in the resulting homogeneous vector) is always 1 due to the fact that the bottom row of the transformation matrix has 2 zeros in the first two positions.
5.1.2. Affine Transformations
An affine transform is of the form:
Note that we donote matrices working on vectors in homogeneous representation with a tilde on top. We already know that \((c\; f)\T\) defines a translation. The (sub)matrix
need not be a rotation matrix. In fact such an affine transform is known to preserve parallelness of lines but angles are not preserved.
5.1.3. Projective Transforms
The most generic case is defined with:
Note that we may choose to set \(P_{33}=1\) because the homogeneous transformation matrix \(P\) and \(a P\) for any \(a\not=0\) have the same interpretation in terms of two dimensional points. We write \(P\sim aP\) to denote this similarity. 1
Because \(g\) and \(h\) are not both equal zero the third component of a transformed vector need not be 1 and thus the interpretation in terms of a 2D vector does require the division by the third component.
A projective transform is guaranteed to preserve collinearity of points but nothing else.
5.1.4. Overview of 2D Transforms
In the table below the basic projective transforms in 2D are given. Please look at the Python code too. Play with the code to sharpen your understanding of the projective transforms encoded in \(3\times3\) homogeneous matrices.
The Degrees of Freedom in the table below indicate the number of parameters that describe the transform. For a translation the dof is 2, for a rotation 1 etc.
Transform of unit square |
Name |
Transformation matrix |
DoF |
---|---|---|---|
Translation |
\(\matvec{ccc}{1&0&t_1\\0&1&t_2\\0&0&1}\) |
2 |
|
Rotation |
\(\matvec{ccc}{\cos(\phi)&-\sin(\phi)&0\\ \sin(\phi)&\cos(\phi)&0\\0&0&1}\) |
1 |
|
Rigid Body |
\(\matvec{ccc}{\cos(\phi)&-\sin(\phi)&t_x\\ \sin(\phi)&\cos(\phi)&t_y\\0&0&1}\) |
3 |
|
Affine |
\(\matvec{ccc}{a&b&c\\d&e&f\\0&0&1}\) |
6 |
|
Projective Transform |
\(\matvec{ccc}{a&b&c\\d&e&f\\g&h&1}\) |
8 |
5.2. Lines in 2D projective space
Using homogeneous coordinates we cannot only describe points in 2D space but also lines in 2D space. The 3 component vector \(\hv l\) represents a line of all points \(\hv x\) (in homogeneous coordinates) such that:
in components we have:
In case \(l_2\not=0\) we can write:
showing the perhaps more familiar line representation \(y=ax+b\). In geometry the line representation \(\hv l\T\hv x=0\) is prefered as it can represent vertical lines as well.
Given two lines \(\hv l\) and \(\hv m\) the intersection \(\hv x\) of these lines is given by the vectorial cross product:
where \(\hv e_1, hv e_2\) and \(\hv e_3\) are the standard basis vectors in 3D space and \(|...|\) denotes the ‘determinant’.
We also have the dual property: two points define a line: let \(\hv x\) and \(\hv y\) be two points then the line passing through these points is given by:
An example may illustrate these properties. Consider the points \(\hv a=(3\;2\;1)\T\), \(\hv b=(1\;4\;1)\T\), \(\hv c=(0\;2\;1)\T\) and \(\hv d=(5\;4\;1)\T\). The line \(\hv l\) passes through the points \(\hv a\) and \(\hv b\) so:
This is indeed the line \(x+y=5\) as can be easily verified by making a plot. For the line \(\hv m\) through the points \(\hv c\) and \(\hv d\) we have:
The intersection \(\hv f\) of lines \(\hv l\) and \(\hv m\) is then given by:
Given a 2D geometrical transform represented with homogeneous matrix \(F\) that takes a point \(\hv x\) and produces a point \(\hv y = F\hv x\). Note that such a transform preserves colinearity of points, so transforming all points on a line \(\hv l\) produces another line \(\hv m\). But what transformation would that be?
Let \(\hv a\) and \(\hv b\) be two points on the line \(\hv l\), i.e.:
then:
Without proof we state that (see Wikipedia)
where \(F^{-T}\) is defined as \((F\T)\inv\) (can you prove that \((F\T)\inv=(F\inv)\T\)?). Convince yourself that this implies:
So in summary: when points transform with \(F\), lines transform with \(F^{-}\T\).
A simpler proof of this fact follows from the following reasoning. If a point \(\hv x\) is on line \(\hv l\) we must have \(\hv l\T \hv x=0\). When transforming all points as \(\hv y = F\hv x\) we can substitute \(\hv x=F\inv \hv y\) and get \(\hv l\T (F\inv \hv y) = ((F\inv)\T\hv l)\T\hv y=0\) for all \(\hv y\) on the line \(\hv m\) and thus \(\hv m = F^{-T}\hv l\).
5.3. Points at infinity
What also makes the homogeneous representation of points and lines in 2D projective space (represented with homogeneous vectors in 3D space…) special is that we can describe so called points at infinity.
Consider two parallel lines \(\hv l = (a\;b\;c)\T\) and \(\hv m = (a\;b\;d)\T\). Calculating the intersection we get:
Note that a normal to the line is given by the 2D vector \((a\;b)\T\) and the 2D point corresponding with homogeneous coordinates \((b\;-a\;0)\T\) is along the direction of the line at infinity.
5.4. Coordinate (Frame) Transforms
A point in 2D space can be represented with two numbers \(x\) and \(y\): the coordinates of the point with respect to a basis. Most often when we use the notation \(\v x=(x\;y)\T\) we imply that these coordinates are measured with respect to the standard basis, i.e.:
Now consider a rotation of the basis vectors over \(\phi\) radians. The rotation matrix \(R\) is given by:
The new basis vectors then become
with respect to this new rotated basis we represent the vector \(\v x\) (that is not changed in space) with coordinates \((x', y')\). We must have:
Thus calculating the new coordinates (with respect to the rotated basis) requires the use of the inverse of the basis transform:
In linear algebra when using a linear transformation, the origin is always mapped onto the origin. Using homogeneous coordinates we can represent translation with a linear operator as well and thus we may shift a coordinate frame in space.
Consider the standard frame and a point with coordinates \((x,y)\). Now consider a frame that is translated with respect to the standard frame such that the origin of the translated frame is at point with coordinates \((a,b)\) (with respect to the standard frame). The frame transform is:
We can check this by transforming the origin of the standard frame \((0,0)\), the first basis vector \((1,0)\) and the second basis vector \((0,1)\):
The coordinate transform is the inverse of the frame transform:
Consider the frame transform with \((a,b)=(5,3)\). If we want to know what the coordinates are in the translated frame of the point \((6,4)\) (coordinates with respect to the standard frame) we calculate:
Now consider the frame transform that is given by:
where \(R\) is a \(2\times2\) rotation matrix and \(\v t\) is a \(2\times1\) translation vector. The matrix \(F\) thus is of size \(3\times3\). The coordinate transform is the inverse:
5.5. Estimating Parameters
5.5.1. Estimating an Affine Transform using LSQ
An affine transform is described with:
where \((x,y)\) are the coordinates in the input image and \((x',y')\) are the coordinates of the point in the resulting image. Note that an affine transform is invertible except for the degenerate case that the entire 2D space is mapped unto a line or point.
For the affine transform we need to know how three points map to their transformed points (in our case we take 3 corners from a parallelogram in the input image). Let \((x_i,y_i)\) be a point in the input image with corresponding point \((x_i', y_i')\) in the output image.
The relation between between these points and the affine transform matrix can be rewritten in the following form:
The above is just for one point correspondence. For 3 point correspondences we can stack the \(x', y'\) values and also add two rows to the matrix on the right hand side. For three points we have:
or:
The \(6\times6\) matrix \(M\) is invertible (given three non collinear points) and thus:
In practice it is hard to define the three points with high accuracy, errors will be made. In that case the calculated transform matrix (represented now with the vector \(\v p\)) can be inaccurate. If we are able to find more points correspondences we might be able to obtain a more accurate transform. Say we have \(n\) point correspondences. This leads to a vector \(\v q\) that has \(2n\) elements and a \(M\) matrix that is \(2n\times6\) matrix. Then obviously we cannot simply invert the matrix \(M\) to obtain the transformation matrix characterized with parameter vector \(\v p\).
Instead we solve for a least squares solution for \(\v p\). For a detailed description we refer to Least Squares Estimators. Here we only give the solution:
Note that Numpy has a special function to solve an equation with one line of code:
p = lstsq(M,q)
5.5.2. Estimating a Projective Transform using the DLT
The perspective transform is the generic transform characterized with a 3x3 homogeneous matrix:
Note the scaling factor \(s\) in the above expression which is due to the normalization that is inherent when using homogeneous coordinates (and a transform for which elements \(g\) and \(h\) are not zero). This is also why in the second equation we write \(\sim\) instead of \(=\).
In this section we will show how to estimate the matrix \(P\) given a number of examples \(\hv x_i \sim P \hv X_i\) for \(i=1\ldots,n\). It is because of the scaling factor \(s\) (note that \(s\) is different in general for each of the examples) that an estimation procedure as described for an affine transform will not work. We will describe the standard method to estimate a projective transform that is known as the Direct Linear Transform.
The first equation above can be rewritten as:
Make sure you understand that this can be written as:
This is the crucial step in the use of the DLT for any purpose: in case your equation is linear in the parameters you have to estimate you are able to rewrite the equation into a form \(A\v p=0\) where \(A\) contains your known data (the examples) and \(\v p\) is a vector of all unknowns to be estimated.
Stacking 4 point correspondences we arrive at:
This is a homogeneous system of equations. It can be shown that in case the points are not colinear there is one non trivial solution for the vector \(\v p\).
In case we have more then 4 point correspondences (and we will when dealing with image mosaics) the matrix \(M\) is of size \(2n\times9\). Then, due to noise in the measurements, there is little chance that there is any null vector. We then set out to find a vector \(\v p\) that minimizes \(\|M \v p\|\) subject to the constraint that \(\|\v p\|=1\) (else the zero vector as trivial solution would suffice). So
Let \(UDV\T\) be the singular value decomposition of \(M\), then:
due to the fact that \(U\) is orthogonal and therefore preserves the norm. Writing \(\v q = V\T \v p\) we have:
because \(V\) is also orthogonal.
Remember that \(D\) is a ‘diagonal’ matrix with the sorted singular values on the diagonal. Convince yourself that the optimal \(\v q\) then is \(q = (0 0 \cdots 0 1)\T\). And thus \(\v p = V \v q\) which is the last column of \(V\).
Solving an overdetermined homogeneous set of equations with the singular value decomposition is often referred to as the SVD-trick.
5.6. Exercises
Rigid Body Transformations
The 2D rotation in homogeneous coordinates is defined with the matrix \(R_\phi\) and the translation is given by the matrix \(T_{\v t}\):
\[\begin{split}R_\phi = \matvec{ccc}{\cos(\phi)&-\sin(\phi)&0\\ \sin(\phi)&\cos(\phi)&0\\0&0&1}, \qquad T_{\v t} = \matvec{ccc}{1&0&t_1\\0&1&t_y\\0&0&1}\end{split}\]Calculate the transformation matrix where your first rotate and then translate, i.e. \(T_{\v{t}} R_{\phi}\).
Calculate the transformation matrix where you do the reverse: first translate and then rotate.
Is the order in which you do the operations important? Give a numerical example and make a drawing of point \((x\;y)\T\), the result after the first and the result after the second transformation.
Invertible transforms
Let \(T\) be a homogeneous \(3\times3\) matrix. The inverse geometrical transform can be found by inverting the matrix:
Consider a pure translation matrix. Obviously the inverse must be equal to the matrix with negative translation vector. Convince yourself that this indeed true by multiplying \(T_{\v t}\) with \(T_{\v t}^{-1} = T_{=\v t}\).
In case \(T\) is not invertible (i.e. singular), what would you expect to be the shape of the transform of the unit square?
Show that
\[\begin{split}\matvec{cc}{ R &\v t\\ \v 0\T & 1}\inv = \matvec{cc}{ R\T & -R\T \v t\\ \v 0\T & 1 }\end{split}\]
Number of point pairs needed to calculate transformation
Let \(T\) be a translation. In case we are given a point and its image (i.e. the result of the transform) we can completely determine the transformation. For a translation the DoF equals 2 and thus we need two equations (one for the x-coordinate and one for the y-coordinate) to calculate the translation parameters.
Calculate the number of corresponding point pairs needed to calculate the parameters of the transformations from the table.
Footnotes
- 1
It is not custom to also denote matrices working on homogeneous vectors with a tilde on top.