5.1. Linear Regression

The canonical example of linear regression, fitting a straight line through data points, is a misleading one. It suggests that the name linear regression stems from the straight line that is fitted. This is not true. Linear regression is about fitting a parameterized function, the hypothesis \(h_{\v\theta}(\v x)\), that is linear in its parameters \(\v\theta\), to the data points \((\v x\ls i, y\ls i)\).

Nevertheless we also start with the canonical example: the straight line fit in the first section. Because of its simplicity we can easily visualize the workings of gradient descent optimization that is central in most machine learning algorithms.

Then we extend linear regression to deal with more then one feature value, this leads to multivariate linear regression.

In case we have a lot of features the hypothesis is getting more complex and we run into the risk of overfitting. The the model learns to adopt to the noise and natural variation in the data leading to a small error on the learning examples but a large(r) error on examples not seen in the learning phase. To tackle this problem we will look at regularization.

Solving a linear regression problem with a gradient descent procedure might seem like too much work as an analytical solution is known.

Throughout our analysis of linear regression we have simply stated that a quadratic cost function had to be used. But why is that? Is there a reason for it? In the last section of this chapter we will derive our cost function from basic principles.