Bayesian Classification ======================= Let $\v X$ be the random (feature) vector $\v X=(X_1\cdots X_n)\T$ characterizing an object. The class of this object is characterized with the discrete random variable $Y$. The goal in classification is to come up with a classification function $\hat y = \classify(\v x)$ that assigns a class to a feature vector $\v x$. To quantify the succes of the classifier we introduce the **loss function** $L$ such that $L(\hat y, y)$ indicates the loss of an incorrect classification. The loss function is a mechanism to make a distinction in the errors a classifier can make. For a medical test it is often considered less of a problem if a patient is incorrectly diagnosed with a disease than the opposite case where the patient is declared healthy where in reality she is not. In many classification problems the **zero-one loss** is used, then we take $L(\hat y, y)=[\hat y\not=y]$. I.e. the loss function is equal to 1 in case the classifier is wrong and zero if it is right. The squared error $L(\hat y,y) = (\hat y-y)^2$ is also used as a loss function (for instance in neural nets) but is more often associated with *regression*. The expected loss given feature vector $\v X = \v x$ for a classifier $\hat y = \classify(\v x)$ equals: .. math:: \mathcal L(\hat y;\v x) = \E(L(\hat y,Y)\given \v X=\v x) = \sum_y L(\hat y, y)\, \P(Y=y\given \v X = \v x) The **Bayesian classifier** then finds the class $\hat y$ with minimal expected loss: .. math:: \classify(\v x) = \arg\min_{\hat y} \mathcal L(\hat y; \v x) = \arg\min_{\hat y} \sum_y L(\hat y, y)\, \P(Y=y\given \v X = \v x) We will look in somewhat more detail at the zero-one loss Bayesian classifier. First we show that the zero-one loss function leads to the **Maximum a Posteriori** classifier. Secondly we consider the **Naive Bayes Classifier**. .. toctree:: MAPClassifier NaiveBayesClassifier