6.2.2. k-Nearest Neighbor Classifier
For the k-nearest neighbor classifier we don’t only consider the closest element in the learning set. Instead we consider the \(k\) closest elements in the learning set. For a two class problem we then classify an unknown feature vector for the class that has the majority in the k closest elements from the learning set.
The algorithm is simple:
First calculate the distance from the feature vector \(\v x\) to all vectors in the learning set.
\[d_i = \|\v x - \v x\ls i\|\]Sort all distances and set the sorted index map \(I_s\) such that
\[\forall i\leq j: d_{I_s(i)} \leq d_{I_s(j)}\]Note: \(I_s\) is what the argsort function in Numpy returns.
For \(i=1,\ldots,k\) count which class \(y\ls{I_s(i)}\) is in the majority. That class will be the result of the classifier.
The k-NNb classifier can be interpreted as a voting algorithm: the closest \(k\) vectors \(\v x\ls i\) in the learning set vote (with equal voting strength) for their class \(y\ls i\). In the next section we consider a generic voting scheme where the voting strength is a function of the distance of the vector \(\v x\) to each of the vectors in the learning set.