Maximum a Posteriori Classifier =============================== The Bayes classifier was defined as: .. math:: \classify(\v x) = \arg\min_{\hat y} \sum_y L(\hat y, y)\, \P(Y=y\given \v X = \v x) In case we set $L(\hat y, y)=[\hat y\not= y]$ we get: .. math:: \classify(\v x) &= \arg\min_{\hat y} \sum_y [\hat y\not= y] \, \P(Y=y\given \v X = \v x)\\ &= \arg\min_{\hat y} \sum_{y\not= \hat y} \P(Y=y\given \v X = \v x)\\ &= \arg\min_{\hat y} \left( 1 - \P(Y=\hat y\given \v X = \v x) \right)\\ &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x) I.e. the Bayes classifier for zero-one loss is equal to the maximum a posteriori classsifier. Using Bayes rule this can be rewritten for a discrete random vector $\v X$ as: .. math:: \classify(\v x) &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\\ &= \arg\max_{\hat y} \frac{\P(\v X=\v x\given Y=\hat y)\,\P(Y=\hat y)}{\P(\v X=\v x)} Because the evidence $\P(\v X=\v x)$ is not dependent on $\hat y$ and is positive we have: .. math:: \classify(\v x) = \arg\max_{\hat y} \P(\v X=\v x\given Y=\hat y)\,\P(Y=\hat y) For a continuous random vector $\v X$ we get: .. math:: \classify(\v x) &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\\ &= \arg\max_{\hat y} \frac{f_{\v X\given Y=\hat y}(\v x)\,\P(Y=\hat y)}{f_{\v X}(\v x)} Again the evidence $f_{\v X}(\v x)$ is positive and independent of $\hat y$ and thus: .. math:: \classify(\v x) = \arg\max_{\hat y} f_{\v X\given Y=\hat y}(\v x)\,\P(Y=\hat y)