5.1.4. Analytical Solution for Linear Regression

For the general case of linear regression we have the cost function:

\[J(\v \theta) = \frac{1}{2m} \| \hv X \v \theta - \v y \|^2\]

and the gradient of the cost function:

\[\frac{\partial J(\v\theta)}{\partial \v\theta} = \frac{1}{m} \hv X\T (\hv X \v\theta - \v y)\]

We are looking for a minimum of the cost function. Remember from high school that a nescessary condition for a minimum of a univariate function is that the derivative is equal to zero. For multivariate function we must have that all partical derivatives are equal to zero. I.e. we have to set the gradient to zero and solve for \(\v\theta\).

\[\hv X\T (\hv X \v\theta - \v y) = 0\]

or equivalently:

\[\v\theta = (\hv X\T \hv X)\inv \hv X\T \v y\]

It can be shown that this parameter vector is indeed the point where the cost function attains its global minimum. The cost function is a convex function with a unique minimum. We will not give a formal proof here. Note that a convex function does not imply that the minimum can be calculated analytically. In another chapter we will discuss logistic regression as an example of an optimization problem that has a unique solution but still needs a numerical technique to find that minimum.

For examples of linear regression in these notes we will use the analytical solution. In practice with real life examples where the number of features tend to be large the analytical solution is not always the best choice. See the discussion in the Coursera Machine Learning Course of Andrew Ng.