1.3. Camera Calibration I

1.3.1. Estimating the Camera Matrix

Our camera model projects a 3D point \(\hv X\) on a 2D retina resulting in point \(\hv x\):

\[\hv x \sim P \hv X\]

Note that mathematically \(\hv x = s P\hv X\) for some non zero scalar \(s\), i.e. the vectors \(\hv x\) and \(P\hv X\) are parallel and therefore (because mathematically these vectors are 3 component vectors we can use the cross product) \(\hv x \times P \hv X=\v 0\).

Using \(\hv p_1\), \(\hv p_2\) and \(\hv p_3\) to denote the 3 row vectors of \(P\) and using \(\hv x = (x\;y\;1)\T\), calculating the cross product leads to:

\[\begin{split}\hv x \times P \hv X = \matvec{c}{ x \hv X\T\hv p_3 - \hv X\T\hv p_1 \\ y \hv X\T\hv p_3 - \hv X\T \hv p_2 \\ x \hv X\T \hv p_2 - y \hv X\T \hv p_1} = \matvec{c}{0\\0\\0}\end{split}\]

Note that the third element in this vector is a linear combination of the first two and thus we can consider the first two elements only. We can rewrite this as:

\[\begin{split}\matvec{ccc}{ \hv X\T & \v 0\T & -x \hv X\T \\ \v 0\T & \hv X\T & -y \hv X\T } \matvec{c}{\hv p_1\\ \hv p_2 \\ \hv p_3} = \matvec{ccc}{ \hv X\T & \v 0\T & -x \hv X\T \\ \v 0\T & \hv X\T & -y \hv X\T }\v p = \matvec{c}{0\\0}\end{split}\]

Note that \(\v p\) stacks the 3 rows of \(P\) on top of eachother forming a 12 component vector.

The above equation is an homogeneous set of equations. It is based on just one pair of point correspondences \((X,Y,Z)\rightarrow(x,y)\). For \(n\) point correspondences we may stack \(2n\) equations on top of eachother:

\[\begin{split}\matvec{ccc}{ \hv X_1\T & \v 0\T & -x_1 \hv X_1\T \\ \v 0\T & \hv X_1\T & -y_1 \hv X_1\T \\ \vdots & \vdots & \vdots \\ \hv X_n\T & \v 0\T & -x_n \hv X_n\T \\ \v 0\T & \hv X_n\T & -y_n \hv X_n\T \\ } \v p = \v 0 \\ A \v p = \v 0\end{split}\]

For 6 point correspondences we can find a non-trivial nulvector \(\v p^\ast\). This only works in case there are no measurement errors (and because the 2D points are measured in images there will be noise), in practical situations using more then the minimal number of point correspondences is to be prefered and we search for the vector \(\v p\) that minimizes \(\|A\v p\|\) subject to the constraint that \(\|\v p\|=1\). The ‘SVD-trick’ can be used here.

Given the optimal vector \(\v p\) we can reshape this vector into a \(3\times4\) camera matrix \(P\). Given the calibrated camera matrix we may take each model 3D point \((X_i,Y_i,Z_i)\) and project it on the camera plane and obtain \((a_i, b_i)\) when we compare this ‘reprojected’ point \((a_i,b_i)\) with the ‘true’ projected point \((x_i,y_i)\) we expect that the reprojection distance \(d_i=\sqrt{(x_i-a_i)^2 + (y_i-b_i)^2}\) to be small.

This however we can only hope for: finding the camera matrix \(P\) by minimizing \(\|A\v p\|\) subject to \(\|\v p\|\) does not guarantee that the sum of square reprojections distances is minimal. However minimizing the sum of square reprojection errors is a difficult (and computationally complex) non linear optimization problem that can only be approximately solved. For this we refer to the literature.