6.1.1. Maximum a Posteriori Classifier
The Bayes classifier was defined as:
\[\classify(\v x) = \arg\min_{\hat y} \sum_y L(\hat y, y)\, \P(Y=y\given \v X = \v x)\]
In case we set \(L(\hat y, y)=[\hat y\not= y]\) we get:
\[\begin{split}\classify(\v x) &= \arg\min_{\hat y} \sum_y [\hat y\not= y] \, \P(Y=y\given \v X = \v x)\\
&= \arg\min_{\hat y} \sum_{y\not= \hat y} \P(Y=y\given \v X = \v x)\\
&= \arg\min_{\hat y} \left( 1 - \P(Y=\hat y\given \v X = \v x) \right)\\
&= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\end{split}\]
I.e. the Bayes classifier for zero-one loss is equal to the maximum a posteriori classsifier.
Using Bayes rule this can be rewritten for a discrete random vector \(\v X\) as:
\[\begin{split}\classify(\v x) &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\\
&= \arg\max_{\hat y} \frac{\P(\v X=\v x\given Y=\hat y)\,\P(Y=\hat y)}{\P(\v X=\v x)}\end{split}\]
Because the evidence \(\P(\v X=\v x)\) is not dependent on \(\hat y\) and is positive we have:
\[\classify(\v x) = \arg\max_{\hat y} \P(\v X=\v x\given Y=\hat y)\,\P(Y=\hat y)\]
For a continuous random vector \(\v X\) we get:
\[\begin{split}\classify(\v x) &= \arg\max_{\hat y} \P(Y=\hat y\given \v X = \v x)\\
&= \arg\max_{\hat y} \frac{f_{\v X\given Y=\hat y}(\v x)\,\P(Y=\hat y)}{f_{\v X}(\v x)}\end{split}\]
Again the evidence \(f_{\v X}(\v x)\) is positive and independent of \(\hat y\) and thus:
\[\classify(\v x) = \arg\max_{\hat y} f_{\v X\given Y=\hat y}(\v x)\,\P(Y=\hat y)\]