Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте k-NN with Multiple Features | Section
Classification with Python

bookk-NN with Multiple Features

You now understand how k-NN works with a single feature. Let's move on to a slightly more complex example that uses two features: weight and width.

In this case, we need to find neighbors based on both width and weight. But there's a small issue with that. Let's plot the sweets and see what goes wrong:

You can see that the weight ranges from 12 to 64, while the width is only between 5 and 12. Since the width's range is much smaller, the sweets appear almost vertically aligned. If we calculate distances now, they will primarily reflect differences in weight, as if we never considered width.

There is a solution, though - scaling the data.

Now, both weight and width are on the same scale and centered around zero. This can be achieved by the StandardScaler class from sklearn. StandardScaler just subtracts the sample's mean and then divides the result by the sample's standard deviation:

Xscaled=XxˉsX_{scaled} = \frac{X - \bar x}{s}

StandardScaler centers the data around zero. While centering is not mandatory for k-NN and might lead to confusion, such as "how can weight be negative", it is simply a way to present data to a computer. Some models require centering, so using StandardScaler for scaling by default is advisable.

In fact, you should always scale the data before using k-Nearest Neighbors. With the data scaled, we can now find the neighbors:

In the case of two features, k-NN defines a circular neighborhood containing the desired number of neighbors. With three features, this becomes a sphere. In higher dimensions, the neighborhood assumes a more complex shape that can't be visualized, yet the underlying calculations remain unchanged.

question mark

Choose the correct statement.

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 3

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

bookk-NN with Multiple Features

Свайпніть щоб показати меню

You now understand how k-NN works with a single feature. Let's move on to a slightly more complex example that uses two features: weight and width.

In this case, we need to find neighbors based on both width and weight. But there's a small issue with that. Let's plot the sweets and see what goes wrong:

You can see that the weight ranges from 12 to 64, while the width is only between 5 and 12. Since the width's range is much smaller, the sweets appear almost vertically aligned. If we calculate distances now, they will primarily reflect differences in weight, as if we never considered width.

There is a solution, though - scaling the data.

Now, both weight and width are on the same scale and centered around zero. This can be achieved by the StandardScaler class from sklearn. StandardScaler just subtracts the sample's mean and then divides the result by the sample's standard deviation:

Xscaled=XxˉsX_{scaled} = \frac{X - \bar x}{s}

StandardScaler centers the data around zero. While centering is not mandatory for k-NN and might lead to confusion, such as "how can weight be negative", it is simply a way to present data to a computer. Some models require centering, so using StandardScaler for scaling by default is advisable.

In fact, you should always scale the data before using k-Nearest Neighbors. With the data scaled, we can now find the neighbors:

In the case of two features, k-NN defines a circular neighborhood containing the desired number of neighbors. With three features, this becomes a sphere. In higher dimensions, the neighborhood assumes a more complex shape that can't be visualized, yet the underlying calculations remain unchanged.

question mark

Choose the correct statement.

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 3
some-alt