Linear Regression ================= The canonical example of linear regression, fitting a straight line through data points, is a misleading one. It suggests that the name *linear* regression stems from the straight line that is fitted. This is not true. Linear regression is about fitting a parameterized function, the **hypothesis** $h_{\v\theta}(\v x)$, **that is linear in its parameters** $\v\theta$, to the data points $(\v x\ls i, y\ls i)$. Nevertheless we also start with the canonical example: the straight line fit in the first section. Because of its simplicity we can easily visualize the workings of **gradient descent optimization** that is central in most machine learning algorithms. .. toctree:: univariate_linear_regression Then we extend linear regression to deal with more then one feature value, this leads to multivariate linear regression. .. toctree:: multivariate_linear_regression extended_features In case we have a lot of features the hypothesis is getting more complex and we run into the risk of **overfitting**. The the model learns to adopt to the noise and natural variation in the data leading to a small error on the learning examples but a large(r) error on examples not seen in the learning phase. To tackle this problem we will look at **regularization**. .. toctree:: regularization Solving a linear regression problem with a gradient descent procedure might seem like too much work as an analytical solution is known. .. toctree:: analytical_solution Throughout our analysis of linear regression we have simply stated that a quadratic cost function had to be used. But why is that? Is there a reason for it? In the last section of this chapter we will derive our cost function from basic principles. .. toctree:: linear_regression_mle .. toctree:: exercises