Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Challenge: Tuning Hyperparameters with RandomizedSearchCV | Modeling
ML Introduction with scikit-learn

book
Challenge: Tuning Hyperparameters with RandomizedSearchCV

The idea behind RandomizedSearchCV is that it works the same as GridSearchCV, but instead of trying all the combinations, it tries a randomly sampled subset.

For example, this param_grid will have 100 combinations:

python
param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 12, 15, 17, 20, 25],
'weights': ['distance', 'uniform'],
'p': [1, 2, 3, 4, 5]}

The GridSearchCV would try all of them, which is time-consuming. With RandomizedSearchCV, you can try only a randomly chosen subset of, say, 20 combinations. It usually leads to a little worse result, but works much faster.

You can control the number of combinations to be tested using the n_iter argument (set to 10 by default). Apart from that, working with it is the same as with GridSearchCV.

Tâche

Swipe to start coding

Your task is to build GridSearchCV and RandomizedSearchCV with 20 combinations and compare the results.

  1. Initialize the RandomizedSearchCV object. Pass the parameters grid and set the number of combinations to 20.
  2. Initialize the GridSearchCV object.
  3. Train both GridSearchCV and RandomizedSearchCV objects.
  4. Print the best estimator of grid.
  5. Print the best score of randomized.

Solution

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Create the param_grid and initialize a model
param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 12, 15, 17, 20, 25],
'weights': ['distance', 'uniform'],
'p': [1, 2, 3, 4, 5]
}
model = KNeighborsClassifier()
# Initialize RandomizedSearchCV and GridSearchCV
randomized = RandomizedSearchCV(model, param_grid, n_iter=20)
grid = GridSearchCV(model, param_grid)
# Train the GridSearchCV object. During training it finds the best parameters
grid.fit(X, y)
randomized.fit(X, y)
# Print the best estimator and its cross-validation score
print('GridSearchCV:')
print(grid.best_estimator_)
print(grid.best_score_)
print('RandomizedSearchCV:')
print(randomized.best_estimator_)
print(randomized.best_score_)

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 4. Chapitre 8
single

single

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Create the param_grid and initialize a model
param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 12, 15, 17, 20, 25],
'weights': ['distance', 'uniform'],
'p': [1, 2, 3, 4, 5]
}
model = KNeighborsClassifier()
# Initialize RandomizedSearchCV and GridSearchCV
randomized = ___
grid = ___
# Train the GridSearchCV object. During training it finds the best parameters
grid.___
randomized.___
# Print the best estimator and its cross-validation score
print('GridSearchCV:')
print(grid.___)
print(grid.best_score_)
print('RandomizedSearchCV:')
print(randomized.best_estimator_)
print(randomized.___)

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

We use cookies to make your experience better!
some-alt