Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Choosing the Best K Value. | k-NN Classifier
Classification with Python

book
Challenge: Choosing the Best K Value.

As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn has a neat class for that task.

The param_grid parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors, you would use:

python
param_grid = {'n_neighbors': range(1, 100)}

The .fit(X, y) method leads the GridSearchCV object to find the best parameters from param_grid and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_ attribute and predict new values using the .predict() method.

Opgave

Swipe to start coding

  1. Import the GridSearchCV class.
  2. Scale the X using StandardScaler.
  3. Look for the best value of n_neighbors among [3, 9, 18, 27].
  4. Initialize and train a GridSearchCV object with 4 folds of cross-validation.
  5. Print the score of the best model.

Løsning

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import pandas as pd
from sklearn.model_selection import GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv')
X = df[['StarWars4_rate', 'StarWars5_rate']] # Store feature columns as `X`
y = df['StarWars6'] # Store target column as `y`
# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Initialize a model
knn = KNeighborsClassifier()
# Print the accuracy on the test set
param_grid = {'n_neighbors': [3, 9, 18, 27]}
grid_search = GridSearchCV(knn, param_grid, cv=4).fit(X_scaled, y)
print(grid_search.best_estimator_)
print(grid_search.best_score_)

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 7
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import pandas as pd
from sklearn.model_selection import ___

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv')
X = df[['StarWars4_rate', 'StarWars5_rate']] # Store feature columns as `X`
y = df['StarWars6'] # Store target column as `y`
# Scale the data
scaler = StandardScaler()
X_scaled = scaler.___(X)
# Initialize a model
knn = KNeighborsClassifier()
# Print the accuracy on the test set
param_grid = {'n_neighbors': ___}
grid_search = GridSearchCV(knn, ___, cv=___).fit(X_scaled, y)
print(grid_search.best_estimator_)
print(grid_search.___)

Spørg AI

expand
ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

some-alt