Learn k-NN with Multiple Features

You now understand how k-NN works when there is only one feature. Let's move to a little more complex example with two features, weight and width.
This way, we need to find neighbors regarding both width and weight. But there is a little problem with it. Let's plot the sweets and see what's wrong:

You can see that weight ranges from 12 to 64 while the width is only between 5 and 12. Since the width's range is much smaller, the sweets look almost vertically aligned. And if we calculate distances now, they will be close to just differences in weight, just like we never considered width :'(
There is a solution, though – scaling the data.

Now both weight and width are on the same scale and centered around zero. This is achievable using the StandardScaler class from sklearn. Here is the syntax:

from sklearn.preprocessing import StandardScaler # Importing the class
X_scaled = StandardScaler().fit_transform(X)

You should always scale the data before using k-Nearest Neighbors.

Note

StandardScaler makes the data centered around zero. Centering is not obligatory for k-NN and may even confuse you "how can weight be negative"... But this is just a way to present data for a computer, and some other models require centering, so it is better to use StandardScaler for scaling by default.

With data being scaled, we can now find the neighbors!

In the case of two feature k-NN, we find the circle neighborhood with the desired number of neighbors. With three features involved, we look for a sphere neighborhood.
With more than three features, the shape of a neighborhood is more complex, so we cannot visualize it, but mathematically it is the same.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat