6.1.1. Maximum a Posteriori Classifier

The Bayes classifier was defined as:

\[\classify(\v x) = \arg\min_{\hat y} \sum_y L(\hat y, y)\, \P(Y=y\given \v X = \v x)\]

In case we set \(L(\hat y, y)=[\hat y\not= y]\) we get:

\[\begin{split}\classify(\v x) &= \arg\min_{\hat y} \sum_y [\hat y\not= y] \, \P(Y=y\given \v X = \v x)\\ &= \arg\min_{\hat y} \sum_{y\not= \hat y} \P(Y=y\given \v X = \v x)\\ &= \arg\min_{\hat y} \left( 1 - \P(Y=\hat y\given \v X = \v x) \right)\\ &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\end{split}\]

I.e. the Bayes classifier for zero-one loss is equal to the maximum a posteriori classsifier.

Using Bayes rule this can be rewritten for a discrete random vector \(\v X\) as:

\[\begin{split}\classify(\v x) &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\\ &= \arg\max_{\hat y} \frac{\P(\v X=\v x\given Y=\hat y)\,\P(Y=\hat y)}{\P(\v X=\v x)}\end{split}\]

Because the evidence \(\P(\v X=\v x)\) is not dependent on \(\hat y\) and is positive we have:

\[\classify(\v x) = \arg\max_{\hat y} \P(\v X=\v x\given Y=\hat y)\,\P(Y=\hat y)\]

For a continuous random vector \(\v X\) we get:

\[\begin{split}\classify(\v x) &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\\ &= \arg\max_{\hat y} \frac{f_{\v X\given Y=\hat y}(\v x)\,\P(Y=\hat y)}{f_{\v X}(\v x)}\end{split}\]

Again the evidence \(f_{\v X}(\v x)\) is positive and independent of \(\hat y\) and thus:

\[\classify(\v x) = \arg\max_{\hat y} f_{\v X\given Y=\hat y}(\v x)\,\P(Y=\hat y)\]