Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn GridSearchCV | Modeling
ML Introduction with scikit-learn

bookGridSearchCV

Now it is time to improve the model's performance by identifying the most suitable hyperparameters.

This process is known as hyperparameter tuning. The standard approach is to test different hyperparameter values, compute the cross-validation score for each, and select the value that produces the highest score.

This process can be done using the GridSearchCV class of the sklearn.model_selection module.

When creating a GridSearchCV object, provide the model and the parameter grid (param_grid), and optionally specify the scoring metric and number of folds. The parameter grid is a dictionary of hyperparameter values to test. For example:

param_grid = {'n_neighbors': [1, 3, 5, 7]}

This configuration evaluates the model with 1, 3, 5, and 7 neighbors.

After initializing GridSearchCV, train it with .fit(X, y).

  • The best model (highest cross-validation score) can be accessed via .best_estimator_.
  • The corresponding cross-validation score can be viewed through .best_score_.
123456789101112131415
import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Print the best estimator and its cross-validation score print(grid_search.best_estimator_) print(grid_search.best_score_)
copy

The next step is to take the best_estimator_ and train it on the entire dataset, since it has already been identified as having the best parameters. This step is performed automatically by GridSearchCV.

As a result, the grid_search object itself becomes a trained model with the optimal parameters. It can be used directly for prediction and evaluation through the .predict() and .score() methods.

123456789101112131415
import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Evaluate the grid_search on the training set # It is done only to show that .score() method works, evaluating on training set is not reliable. print(grid_search.score(X, y))
copy
question mark

Once you trained a GridSearchCV object, you can use it to make predictions using the .predict() method. Is it correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 6

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 3.13

bookGridSearchCV

Swipe to show menu

Now it is time to improve the model's performance by identifying the most suitable hyperparameters.

This process is known as hyperparameter tuning. The standard approach is to test different hyperparameter values, compute the cross-validation score for each, and select the value that produces the highest score.

This process can be done using the GridSearchCV class of the sklearn.model_selection module.

When creating a GridSearchCV object, provide the model and the parameter grid (param_grid), and optionally specify the scoring metric and number of folds. The parameter grid is a dictionary of hyperparameter values to test. For example:

param_grid = {'n_neighbors': [1, 3, 5, 7]}

This configuration evaluates the model with 1, 3, 5, and 7 neighbors.

After initializing GridSearchCV, train it with .fit(X, y).

  • The best model (highest cross-validation score) can be accessed via .best_estimator_.
  • The corresponding cross-validation score can be viewed through .best_score_.
123456789101112131415
import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Print the best estimator and its cross-validation score print(grid_search.best_estimator_) print(grid_search.best_score_)
copy

The next step is to take the best_estimator_ and train it on the entire dataset, since it has already been identified as having the best parameters. This step is performed automatically by GridSearchCV.

As a result, the grid_search object itself becomes a trained model with the optimal parameters. It can be used directly for prediction and evaluation through the .predict() and .score() methods.

123456789101112131415
import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Create the param_grid and initialize GridSearchCV object param_grid = {'n_neighbors': [1,3,5,7,9]} grid_search = GridSearchCV(KNeighborsClassifier(), param_grid) # Train the GridSearchCV object. During training it finds the best parameters grid_search.fit(X, y) # Evaluate the grid_search on the training set # It is done only to show that .score() method works, evaluating on training set is not reliable. print(grid_search.score(X, y))
copy
question mark

Once you trained a GridSearchCV object, you can use it to make predictions using the .predict() method. Is it correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 6
some-alt