6.2.2. k-Nearest Neighbor Classifier

For the k-nearest neighbor classifier we don’t only consider the closest element in the learning set. Instead we consider the \(k\) closest elements in the learning set. For a two class problem we then classify an unknown feature vector for the class that has the majority in the k closest elements from the learning set.

The algorithm is simple:

  1. First calculate the distance from the feature vector \(\v x\) to all vectors in the learning set.

    \[d_i = \|\v x - \v x\ls i\|\]
  2. Sort all distances and set the sorted index map \(I_s\) such that

    \[\forall i\leq j: d_{I_s(i)} \leq d_{I_s(j)}\]

    Note: \(I_s\) is what the argsort function in Numpy returns.

  3. For \(i=1,\ldots,k\) count which class \(y\ls{I_s(i)}\) is in the majority. That class will be the result of the classifier.

The k-NNb classifier can be interpreted as a voting algorithm: the closest \(k\) vectors \(\v x\ls i\) in the learning set vote (with equal voting strength) for their class \(y\ls i\). In the next section we consider a generic voting scheme where the voting strength is a function of the distance of the vector \(\v x\) to each of the vectors in the learning set.