Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Challenge: Tuning Hyperparameters with RandomizedSearchCV | Modeling
ML Introduction with scikit-learn

book
Challenge: Tuning Hyperparameters with RandomizedSearchCV

The idea behind RandomizedSearchCV is that it works the same as GridSearchCV, but instead of trying all the combinations, it tries a randomly sampled subset.

For example, this param_grid will have 100 combinations:

python
param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 12, 15, 17, 20, 25],
'weights': ['distance', 'uniform'],
'p': [1, 2, 3, 4, 5]}

The GridSearchCV would try all of them, which is time-consuming. With RandomizedSearchCV, you can try only a randomly chosen subset of, say, 20 combinations. It usually leads to a little worse result, but works much faster.

You can control the number of combinations to be tested using the n_iter argument (set to 10 by default). Apart from that, working with it is the same as with GridSearchCV.

Uppgift

Swipe to start coding

Your task is to build GridSearchCV and RandomizedSearchCV with 20 combinations and compare the results.

  1. Initialize the RandomizedSearchCV object. Pass the parameters grid and set the number of combinations to 20.
  2. Initialize the GridSearchCV object.
  3. Train both GridSearchCV and RandomizedSearchCV objects.
  4. Print the best estimator of grid.
  5. Print the best score of randomized.

Lösning

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Create the param_grid and initialize a model
param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 12, 15, 17, 20, 25],
'weights': ['distance', 'uniform'],
'p': [1, 2, 3, 4, 5]
}
model = KNeighborsClassifier()
# Initialize RandomizedSearchCV and GridSearchCV
randomized = RandomizedSearchCV(model, param_grid, n_iter=20)
grid = GridSearchCV(model, param_grid)
# Train the GridSearchCV object. During training it finds the best parameters
grid.fit(X, y)
randomized.fit(X, y)
# Print the best estimator and its cross-validation score
print('GridSearchCV:')
print(grid.best_estimator_)
print(grid.best_score_)
print('RandomizedSearchCV:')
print(randomized.best_estimator_)
print(randomized.best_score_)

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 4. Kapitel 8
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Create the param_grid and initialize a model
param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 12, 15, 17, 20, 25],
'weights': ['distance', 'uniform'],
'p': [1, 2, 3, 4, 5]
}
model = KNeighborsClassifier()
# Initialize RandomizedSearchCV and GridSearchCV
randomized = ___
grid = ___
# Train the GridSearchCV object. During training it finds the best parameters
grid.___
randomized.___
# Print the best estimator and its cross-validation score
print('GridSearchCV:')
print(grid.___)
print(grid.best_score_)
print('RandomizedSearchCV:')
print(randomized.best_estimator_)
print(randomized.___)

Fråga AI

expand
ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

some-alt