What is k-NN
Let's start our classification adventure with the simplest task - binary classification. Suppose we want to classify sweets as cookies/not cookies based on a single feature: their weight.
A simple way to predict the class of a new instance is to look at its closest neighbor. In our example, we must find a sweet that weighs most similarly to the new instance.
That is the idea behind k-Nearest Neighbors(k-NN) - we just look at the neighbors. The k-NN algorithm assumes that similar things exist in close proximity. In other words, similar things are near each other. k in the k-NN stands for the number of neighbors we consider when making a prediction.
In the example above, we considered only 1 neighbor, so it was 1-Nearest Neighbor. But usually, k is set to a bigger number, as looking only at one neighbor can be unreliable:
If k (number of neighbors) is greater than one, we choose the most frequent class in the neighborhood as a prediction. Here is an example of predicting two new instances with k=3:
As you can see, changing the k may cause different predictions.
Occasionally, k-NN produces a tie when multiple classes appear equally among the nearest neighbors. Most libraries, including scikit-learn, break ties by choosing the first class in their internal order - something to keep in mind, since it can subtly affect reproducibility and interpretation.
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Genial!
Completion tasa mejorada a 5.88
What is k-NN
Desliza para mostrar el menú
Let's start our classification adventure with the simplest task - binary classification. Suppose we want to classify sweets as cookies/not cookies based on a single feature: their weight.
A simple way to predict the class of a new instance is to look at its closest neighbor. In our example, we must find a sweet that weighs most similarly to the new instance.
That is the idea behind k-Nearest Neighbors(k-NN) - we just look at the neighbors. The k-NN algorithm assumes that similar things exist in close proximity. In other words, similar things are near each other. k in the k-NN stands for the number of neighbors we consider when making a prediction.
In the example above, we considered only 1 neighbor, so it was 1-Nearest Neighbor. But usually, k is set to a bigger number, as looking only at one neighbor can be unreliable:
If k (number of neighbors) is greater than one, we choose the most frequent class in the neighborhood as a prediction. Here is an example of predicting two new instances with k=3:
As you can see, changing the k may cause different predictions.
Occasionally, k-NN produces a tie when multiple classes appear equally among the nearest neighbors. Most libraries, including scikit-learn, break ties by choosing the first class in their internal order - something to keep in mind, since it can subtly affect reproducibility and interpretation.
¡Gracias por tus comentarios!