Course Content
Classification with Python
Classification with Python
k-NN with Multiple Features
You now understand how k-NN works when there is only one feature. Let's move to a little more complex example with two features, weight and width.
This way, we need to find neighbors regarding both width and weight. But there is a little problem with it. Let's plot the sweets and see what's wrong:
You can see that weight ranges from 12 to 64 while the width is only between 5 and 12. Since the width's range is much smaller, the sweets look almost vertically aligned. And if we calculate distances now, they will be close to just differences in weight, just like we never considered width :'(
There is a solution, though – scaling the data.
Now both weight and width are on the same scale and centered around zero. This is achievable using the StandardScaler
class from sklearn
. Here is the syntax:
You should always scale the data before using k-Nearest Neighbors.
Note
StandardScaler
makes the data centered around zero. Centering is not obligatory for k-NN and may even confuse you "how can weight be negative"... But this is just a way to present data for a computer, and some other models require centering, so it is better to useStandardScaler
for scaling by default.
With data being scaled, we can now find the neighbors!
In the case of two feature k-NN, we find the circle neighborhood with the desired number of neighbors. With three features involved, we look for a sphere neighborhood.
With more than three features, the shape of a neighborhood is more complex, so we cannot visualize it, but mathematically it is the same.
Thanks for your feedback!