GridSearchCV
Now it is time to improve the model's performance by identifying the most suitable hyperparameters.
This process is known as hyperparameter tuning. The standard approach is to test different hyperparameter values, compute the cross-validation score for each, and select the value that produces the highest score.
This process can be done using the GridSearchCV
class of the sklearn.model_selection
module.
When creating a GridSearchCV
object, provide the model and the parameter grid (param_grid
), and optionally specify the scoring metric and number of folds. The parameter grid is a dictionary of hyperparameter values to test. For example:
param_grid = {'n_neighbors': [1, 3, 5, 7]}
This configuration evaluates the model with 1, 3, 5, and 7 neighbors.
After initializing GridSearchCV
, train it with .fit(X, y)
.
- The best model (highest cross-validation score) can be accessed via
.best_estimator_
. - The corresponding cross-validation score can be viewed through
.best_score_
.
123456789101112131415import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Print the best estimator and its cross-validation score print(grid_search.best_estimator_) print(grid_search.best_score_)
The next step is to take the best_estimator_
and train it on the entire dataset, since it has already been identified as having the best parameters. This step is performed automatically by GridSearchCV
.
As a result, the grid_search
object itself becomes a trained model with the optimal parameters. It can be used directly for prediction and evaluation through the .predict()
and .score()
methods.
123456789101112131415import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Evaluate the grid_search on the training set # It is done only to show that .score() method works, evaluating on training set is not reliable. print(grid_search.score(X, y))
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.13
GridSearchCV
Swipe to show menu
Now it is time to improve the model's performance by identifying the most suitable hyperparameters.
This process is known as hyperparameter tuning. The standard approach is to test different hyperparameter values, compute the cross-validation score for each, and select the value that produces the highest score.
This process can be done using the GridSearchCV
class of the sklearn.model_selection
module.
When creating a GridSearchCV
object, provide the model and the parameter grid (param_grid
), and optionally specify the scoring metric and number of folds. The parameter grid is a dictionary of hyperparameter values to test. For example:
param_grid = {'n_neighbors': [1, 3, 5, 7]}
This configuration evaluates the model with 1, 3, 5, and 7 neighbors.
After initializing GridSearchCV
, train it with .fit(X, y)
.
- The best model (highest cross-validation score) can be accessed via
.best_estimator_
. - The corresponding cross-validation score can be viewed through
.best_score_
.
123456789101112131415import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Print the best estimator and its cross-validation score print(grid_search.best_estimator_) print(grid_search.best_score_)
The next step is to take the best_estimator_
and train it on the entire dataset, since it has already been identified as having the best parameters. This step is performed automatically by GridSearchCV
.
As a result, the grid_search
object itself becomes a trained model with the optimal parameters. It can be used directly for prediction and evaluation through the .predict()
and .score()
methods.
123456789101112131415import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Evaluate the grid_search on the training set # It is done only to show that .score() method works, evaluating on training set is not reliable. print(grid_search.score(X, y))
Thanks for your feedback!