6.2. Nearest Neighbor Classification

In nearest neighbor classification we start with a learning set of feature vectors \(\v x\ls i\) and associated class labels \(y\ls i\). We assume that each example in the learning set \((\v x\ls i,y\ls i)\) is the simultanuous realization of i.i.d. contiuous random variables \(\v X_i\) and discrete random variables \(Y_i\) for \(i=1,\ldots,m\).

Bayesian classification assigns a feature vector \(\v x\) to class \(y\) such that

\[y = \arg\max_{y'} \P(Y=y'\given \v X=x)\]

We have already shown that this can be rewritten as:

\[y = \arg\max_{y'} f_{\v X\given Y=y'}(\v x) \P(Y=y')\]

Generative classifiers try to estimate the class conditional data distributions functions from the learning set.

A nearest neighbor classifier is based on the same principle but without explicit learning of probability distributions. Instead the entire learning set is memorized and to classify an unknown \(\v x\) we simply assign it to the class of the closest example in the learning set. Evidently in case the class conditional probability is high we are bound to find an example in the learning set close to the unknown \(\v x\) that is close to one of the learning examples.

We will discuss not only the simple nearest neighbor classifier but also the k-nearest neighbor classifier where the resulting class is the taken to be the most often occuring class in the k nearest neighbors. Then we generalize the classifier to a weighted classifier using in principle all examples in the learning set.