============================== Maximum Likelihood Estimator ============================== We have defined in a previous section that: .. math:: P(Y=1\given \v X=\v x) = h_{\v\theta}(\v x) then of course for a 2 class problem: .. math:: P(Y=0\given \v X=\v x) = 1 - h_{\v\theta}(\v x) We thus have that :math:`Y\given\v X=\v x \sim \Bernoulli(h_{\v\theta}(\v x))`. The probability function then is: .. math:: \P(Y=y\given\v X=\v x) = \begin{cases} h_{\v\theta}(\v x) &: y=1\\ 1 - h_{\v\theta}(\v x) &: y=0 \end{cases} or equivalently: .. math:: \P(Y=y\given\v X=\v x) = \left(h_{\v\theta}(\v x)\right)^y \, \left(1-h_{\v\theta}(\v x)\right)^{1-y} Note that in the above expression we have used the fact that $y$ is either zero or one. The training set $(\v x\ls i, y\ls i)$ for $i=1,\ldots,m$ can be considered to be the realization of $m$ i.i.d. random vectors and variables $(\v X\ls i, Y\ls i)$. Probability for the entire training set then is .. math:: \P(Y\ls 1=y\ls 1,\ldots,Y\ls m=y\ls m \given \v X\ls 1 = \v x\ls 1,\ldots, \v X\ls m = \v x\ls m ) = \\ \prod_{i=1}^{m} \P(Y\ls 1=y\ls i\given \v X\ls i = \v x\ls i)\\ = \prod_{i=1}^{m} \left(h_{\v\theta}(\v x\ls i)\right)^{y\ls i} \, \left(1-h_{\v\theta}(\v x\ls i)\right)^{1-y\ls i} The above expression can also be interpreted as the **likelihood** of the data (the training set) given the parameter vector $\v\theta$: .. math:: \ell(\v\theta) = \prod_{i=1}^{m} \left(h_{\v\theta}(\v x\ls i)\right)^{y\ls i} \, \left(1-h_{\v\theta}(\v x\ls i)\right)^{1-y\ls i} The maximum likelihood estimator for theta is then given by: .. math:: \hat{\v\theta} = \arg\max_{\v\theta} \log( \ell(\v\theta)) Finding the optimal $\v\theta$ has to be done using a numerical technique, unlike the case for linear regression there is no analytical solution for logistic regression. We thus have to calculate the gradient of $-\log\ell(\v\theta)$ and then iterate the gradient descent steps as we have done for the linear regression. You may skip the next subsection and take the gradient derivation for granted.