Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Implementing k-NN | k-NN Classifier
Classification with Python

bookImplementing k-NN

KNeighborsClassifier

Implementing k-Nearest Neighbors is pretty straightforward. We only need to import and use the KNeighborsClassifier class.

Once you imported the class and created a class object like this:

# Importing the class
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

You need to feed it the training data using the .fit() method:

knn.fit(X_scaled, y)

And that's it! You can predict new values now.

y_pred = knn.predict(X_new_scaled)

Scaling the data

However, remember that the data must be scaled. StandardScaler is commonly used for this purpose:

You must compute xˉ\bar x and ss only on the training set using .fit() or .fit_transform(). Then use .transform() on the test set so both sets are scaled identically:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Using different scaling values for train and test harms predictions.

Example

We predict whether a person enjoys Star Wars VI using their ratings for Episodes IV and V (from The Movies Dataset). After training, we test two users: one rated IV/V as 5 and 5, the other as 4.5 and 4.

123456789101112131415161718192021222324252627
from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv') # Dropping the target column and leaving only features as `X_train` X_train = df.drop('StarWars6', axis=1) # Storing target column as `y_train`, which contains 1 (liked SW 6) or 0 (didn't like SW 6) y_train = df['StarWars6'] # Test set of two people X_test = np.array([[5, 5], [4.5, 4]]) # Scaling the data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Building a model and predict new instances knn = KNeighborsClassifier(n_neighbors=13).fit(X_train, y_train) y_pred = knn.predict(X_test) print(y_pred)
copy
question mark

Which of the following class names from scikit-learn are used to implement the k-Nearest Neighbors classifier and to scale features when preparing data for k-NN?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 4.17

bookImplementing k-NN

Swipe to show menu

KNeighborsClassifier

Implementing k-Nearest Neighbors is pretty straightforward. We only need to import and use the KNeighborsClassifier class.

Once you imported the class and created a class object like this:

# Importing the class
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

You need to feed it the training data using the .fit() method:

knn.fit(X_scaled, y)

And that's it! You can predict new values now.

y_pred = knn.predict(X_new_scaled)

Scaling the data

However, remember that the data must be scaled. StandardScaler is commonly used for this purpose:

You must compute xˉ\bar x and ss only on the training set using .fit() or .fit_transform(). Then use .transform() on the test set so both sets are scaled identically:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Using different scaling values for train and test harms predictions.

Example

We predict whether a person enjoys Star Wars VI using their ratings for Episodes IV and V (from The Movies Dataset). After training, we test two users: one rated IV/V as 5 and 5, the other as 4.5 and 4.

123456789101112131415161718192021222324252627
from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv') # Dropping the target column and leaving only features as `X_train` X_train = df.drop('StarWars6', axis=1) # Storing target column as `y_train`, which contains 1 (liked SW 6) or 0 (didn't like SW 6) y_train = df['StarWars6'] # Test set of two people X_test = np.array([[5, 5], [4.5, 4]]) # Scaling the data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Building a model and predict new instances knn = KNeighborsClassifier(n_neighbors=13).fit(X_train, y_train) y_pred = knn.predict(X_test) print(y_pred)
copy
question mark

Which of the following class names from scikit-learn are used to implement the k-Nearest Neighbors classifier and to scale features when preparing data for k-NN?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 4
some-alt