Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Implementing k-NN | Section
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Supervised Learning Essentials

bookImplementing k-NN

KNeighborsClassifier

Implementing k-Nearest Neighbors is pretty straightforward. We only need to import and use the KNeighborsClassifier class.

Constructor:

  • KNeighborsClassifier(n_neighbors = 5)
    • n_neighbors – number of neighbors (k). Default value is 5;

Methods:

  • fit(X, y) – Fit the training set;
  • predict(X) – Predict the class for X;
  • score(X, y) – Returns the accuracy for the X, y set.

Once you imported the class and created a class object like this:

# Importing the class
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

You need to feed it the training data using the .fit() method:

knn.fit(X_scaled, y)

And that's it! You can predict new values now.

y_pred = knn.predict(X_new_scaled)

Scaling the data

However, remember that the data must be scaled. StandardScaler is commonly used for this purpose:

Constructor:

  • StandardScaler().

Methods:

  • fit(X) – calculates xΛ‰\bar{x} and ss for X;
  • transform(X) – returns XscaledX_{\text{scaled}} using xΛ‰,s\bar{x}, s from .fit();
  • fit_transform(X) – .fit(X) then .transform(X).

You must compute xˉ\bar x and ss only on the training set using .fit() or .fit_transform(). Then use .transform() on the test set so both sets are scaled identically:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Using different scaling values for train and test harms predictions.

Example

We predict whether a person enjoys Star Wars VI using their ratings for Episodes IV and V (from The Movies Dataset). After training, we test two users: one rated IV/V as 5 and 5, the other as 4.5 and 4.

123456789101112131415161718192021222324252627
from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv') # Dropping the target column and leaving only features as `X_train` X_train = df.drop('StarWars6', axis=1) # Storing target column as `y_train`, which contains 1 (liked SW 6) or 0 (didn't like SW 6) y_train = df['StarWars6'] # Test set of two people X_test = np.array([[5, 5], [4.5, 4]]) # Scaling the data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Building a model and predict new instances knn = KNeighborsClassifier(n_neighbors=13).fit(X_train, y_train) y_pred = knn.predict(X_test) print(y_pred)
copy
question mark

Which of the following class names from scikit-learn are used to implement the k-Nearest Neighbors classifier and to scale features when preparing data for k-NN?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 17

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookImplementing k-NN

Swipe to show menu

KNeighborsClassifier

Implementing k-Nearest Neighbors is pretty straightforward. We only need to import and use the KNeighborsClassifier class.

Constructor:

  • KNeighborsClassifier(n_neighbors = 5)
    • n_neighbors – number of neighbors (k). Default value is 5;

Methods:

  • fit(X, y) – Fit the training set;
  • predict(X) – Predict the class for X;
  • score(X, y) – Returns the accuracy for the X, y set.

Once you imported the class and created a class object like this:

# Importing the class
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

You need to feed it the training data using the .fit() method:

knn.fit(X_scaled, y)

And that's it! You can predict new values now.

y_pred = knn.predict(X_new_scaled)

Scaling the data

However, remember that the data must be scaled. StandardScaler is commonly used for this purpose:

Constructor:

  • StandardScaler().

Methods:

  • fit(X) – calculates xΛ‰\bar{x} and ss for X;
  • transform(X) – returns XscaledX_{\text{scaled}} using xΛ‰,s\bar{x}, s from .fit();
  • fit_transform(X) – .fit(X) then .transform(X).

You must compute xˉ\bar x and ss only on the training set using .fit() or .fit_transform(). Then use .transform() on the test set so both sets are scaled identically:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Using different scaling values for train and test harms predictions.

Example

We predict whether a person enjoys Star Wars VI using their ratings for Episodes IV and V (from The Movies Dataset). After training, we test two users: one rated IV/V as 5 and 5, the other as 4.5 and 4.

123456789101112131415161718192021222324252627
from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv') # Dropping the target column and leaving only features as `X_train` X_train = df.drop('StarWars6', axis=1) # Storing target column as `y_train`, which contains 1 (liked SW 6) or 0 (didn't like SW 6) y_train = df['StarWars6'] # Test set of two people X_test = np.array([[5, 5], [4.5, 4]]) # Scaling the data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Building a model and predict new instances knn = KNeighborsClassifier(n_neighbors=13).fit(X_train, y_train) y_pred = knn.predict(X_test) print(y_pred)
copy
question mark

Which of the following class names from scikit-learn are used to implement the k-Nearest Neighbors classifier and to scale features when preparing data for k-NN?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 17
some-alt