k-NN Summaryk-NN Summary

From what we have learned, we can tell that k-NN is easy to implement but requires scaling. It has some more peculiarities:

  1. k-NN does not require training.
    Unlike many other algorithms, k-NN does not learn anything during training. It just needs to keep the information about all data points coordinates.
    But since all the calculations are performed during predictions, the prediction time is larger compared to other algorithms.
  2. k-NN is a greedy algorithm.
    The model calculates distances to each training instance to find the neighbors. Thus it may get painfully slow for large datasets.
  3. Easy to add new training data.
    Since the model does not need to train, we can just add new training data points, and the predictions will adjust.
  4. The curse of dimensionality.
    Some algorithms really struggle when the number of dimensions(features) is large. And unfortunately, k-NN has this problem too. The distance between two points in high-dimensional space tends to become similar regardless of the actual values of the features, so it becomes much harder to determine whether the instances are similar.

So, here is a little summary of the k-NN algorithm:

No training timeNeeds feature scaling
Easy to add new training dataPrediction time is high
Doesn't work well with a large number of training instances
Doesn't work well with a large number of features

Everything was clear?

Section 1. Chapter 8