2.1. Triangulation

Consider two camera’s looking at the same scene. Both camera’s depict a point \(\hv X\) in 3D space. Assume both camera’s are calibrated, we have matrix \(P\) and \(P'\) for the two camera’s and observed points \(\hv x\) and \(\hv x'\).

Note that \(P\) and \(P'\) encode the position of the camera in space and thus having \(\hv x\) we can virtually draw the line fromt \(\hv O\) (the center of the first camera) through \(\hv x\) (interpreted as a 3D point on the retina of the first camera). In the same way we can find the ray from \(\hv O'\) through \(\hv x'\). The intersection of both rays has to be the 3D point \(\hv X\).

Although doable this geometrical construction of the point \(\hv X\) is error prone and due to measurement errors will have no neat solution (the rays do not intersect in space). In the last case we can opt for an approximative solution (e.g. the point that is closest to both rays).

It is way easier to come up with an algebraic solution. We have:

\[\begin{split}\hv x \sim P \hv X = \begin{pmatrix} \v p_1\T\hv X \\ \v p_2\T\hv X \\ \v p_3\T\hv X \end{pmatrix}\end{split}\]

Again, as in the discussion of the calibration, we use the cross product as \(\hv x \sim P\hv X \Leftrightarrow \hv x \times P\hv X = \v 0\)

\[\begin{split}\hv x \times \begin{pmatrix} \v p_1\T\hv X \\ \v p_2\T\hv X \\ \v p_3\T\hv X \end{pmatrix} &= 0\\\end{split}\]

Calculating the cross product leads to:

\[\begin{split}\begin{pmatrix} y \v p_3\T\hv X - \v p_2\T\hv X\\ -x \v p_3\T\hv X + \v p_1\T\hv X\\ x\v p_2\T\hv X - y \v p_1\T\hv X\end{pmatrix} = \v 0\end{split}\]

Note that the third element in the vector is a linear combination of the first two elements and therefore adds no extra constraint and can be left out. Rearranging rows and negating an entire row (allowed because we search for a null vector) we arrive at:

\[\begin{split}\begin{pmatrix} x \v p_3\T - \v p_1\T\\ y \v p_3\T - \v p_2\T \end{pmatrix} \hv X = \v 0\end{split}\]

The same analysis can be done for \(\v x' = P'\hv X\) leading to:

\[\begin{split}\begin{pmatrix} x' \v p'_3\T - \v p'_1\T\\ y' \v p'_3\T - \v p'_2\T \end{pmatrix} \hv X = \v 0\end{split}\]

Stacking both we get:

\[\begin{split}\begin{pmatrix} x \v p_3\T - \v p_1\T\\ y \v p_3\T - \v p_2\T\\ x' \v p'_3\T - \v p'_1\T\\ y' \v p'_3\T - \v p'_2\T \end{pmatrix} \hv X = \v 0\end{split}\]

This homogeneous set of equations can be solved for \(\hv X\) with the SVD trick. Note that this method generalizes easily in case the point \(\hv X\) is seen by more than two calibrated camera’s,

The crucial assumptions in these direct triangulation calculations are that (a) both camera’s are calibrated with respect to a common frame and (b) corresponding points \(\hv x\) and \(\hv x'\) can be found in the image.