Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Implementing k-NN | k-NN Classifier
Classification with Python

book
Implementing k-NN

KNeighborsClassifier

Implementing k-Nearest Neighbors is pretty straightforward. We only need to import and use the KNeighborsClassifier class.

Once you imported the class and created a class object like this:

python
from sklearn.neighbors import KNeighborsClassifier # Import the class
knn = KNeighborsClassifier(n_neighbors=3)

You need to feed it the training data using the .fit() method:

python
knn.fit(X_scaled, y)

And that's it! You can predict new values now.

python
y_pred = knn.predict(X_new_scaled)

Scaling the data

However, remember that the data must be scaled. Let's take a closer look at the StandardScaler commonly used for scaling the data.
StandardScaler just subtracts the sample's mean and then divides the result by the sample's standard deviation.

Note

If the terms sample mean and sample standard deviation sound unfamiliar, you can check out our Learning Statistics with Python. . But an understanding of what those numbers are for is not mandatory, so you can just ignore the meanings of and s and move on :)

You should calculate and s on the training set(using either .fit() or .fit_transform()) and use the same and s to preprocess the data you are predicting(using .transform()). Here is an example:

python
from sklearn.preprocessing import StandardScaler # import the class
# `X` is the training set's features
# `X_new` is the new instances' features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # calculates x̄ and s and scales the `X`
X_new_scaled = scaler.transform(X_new) # scales the `X_new` with x̄ and s calculated in the previous line

If you use different x̄ and s for training set and new instances your predictions will likely be worse.

Example

So to perform a k-NN classification in Python, you need to use KNeighborsClassifier and StandardScaler. Your code will look like this:

python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler

# Load and preprocess `X`, `y`, and `X_new` here

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_new_scaled = scaler.transform(X_new)

knn = KNeighborsClassifier(n_neighbors=13).fit(X_scaled, y)
y_pred = knn.predict(X_new_scaled)

Here is a simple example where we try to predict whether the person will like Star Wars VI based on his ratings for Star Wars IV and V. We will read the data from the URL.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import warnings

warnings.filterwarnings('ignore')


df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv')
X = df[['StarWars4_rate', 'StarWars5_rate']] # Store feature columns as X
y = df['StarWars6'] # Store target column as y (contains 1(liked SW 6) or 0(didn't like SW 6)
X_new = np.array([[5, 5], [4.5, 4]]) # 2 insances to predict, ratings [5, 5] and [4.5 for Star Wars IV and 4 for V]
# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_new_scaled = scaler.transform(X_new)
# Build a model and predict new instances
knn = KNeighborsClassifier(n_neighbors=13).fit(X_scaled, y)
y_pred = knn.predict(X_new_scaled)
print(y_pred)
123456789101112131415161718192021
from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv') X = df[['StarWars4_rate', 'StarWars5_rate']] # Store feature columns as X y = df['StarWars6'] # Store target column as y (contains 1(liked SW 6) or 0(didn't like SW 6) X_new = np.array([[5, 5], [4.5, 4]]) # 2 insances to predict, ratings [5, 5] and [4.5 for Star Wars IV and 4 for V] # Scale the data scaler = StandardScaler() X_scaled = scaler.fit_transform(X) X_new_scaled = scaler.transform(X_new) # Build a model and predict new instances knn = KNeighborsClassifier(n_neighbors=13).fit(X_scaled, y) y_pred = knn.predict(X_new_scaled) print(y_pred)
copy

The data is taken from The Movies Dataset with extra preprocessing. A person considered liking Star Wars VI if he/her rated it more than 4/5.

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 4

Pergunte à IA

expand
ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

some-alt