5.1. Linear Regression

The canonical example of linear regression, fitting a straight line through data points, is a misleading one. It suggests that the name linear regression stems from the straight line that is fitted. This is not true. Linear regression is about fitting a parameterized function, the hypothesis \(h_{\v\theta}(\v x)\), that is linear in its parameters \(\v\theta\), to the data points \((\v x\ls i, y\ls i)\).

Nevertheless we also start with the canonical example: the straight line fit in the first section. Because of its simplicity we can easily visualize the workings of gradient descent optimization that is central in most machine learning algorithms.

5.1.1. Univariate Linear Regression
- 5.1.1.1. One Learning Examples at a Time
- 5.1.1.2. Vectorization of the Learning Set

Then we extend linear regression to deal with more then one feature value, this leads to multivariate linear regression.

5.1.2. Multivariate Linear Regression
- 5.1.2.1. From 1 to \(n\) Features
- 5.1.2.2. Extended Features

In case we have a lot of features the hypothesis is getting more complex and we run into the risk of overfitting. The the model learns to adopt to the noise and natural variation in the data leading to a small error on the learning examples but a large(r) error on examples not seen in the learning phase. To tackle this problem we will look at regularization.

5.1.3. Dealing with Overfitting using Regularization

Solving a linear regression problem with a gradient descent procedure might seem like too much work as an analytical solution is known.

5.1.4. Analytical Solution for Linear Regression

Throughout our analysis of linear regression we have simply stated that a quadratic cost function had to be used. But why is that? Is there a reason for it? In the last section of this chapter we will derive our cost function from basic principles.

5.1.5. Linear Regression from Basic Principles
- 5.1.5.1. A Statistical View on Regression
- 5.1.5.2. Maximum Likelihood Estimator

5.1.6. Exercises