Learn GridSearchCV

To improve model performance, we tune hyperparameters. The idea is simple: test different values, compute cross-validation scores, and choose the one with the highest score.

This process can be done using the GridSearchCV class of the sklearn.model_selection module.

GridSearchCV requires a model and a parameter grid (param_grid). Example:

param_grid = {'n_neighbors': [1, 3, 5, 7]}

After initializing GridSearchCV, call .fit(X, y).

The best model is in .best_estimator_;
Its cross-validation score is in .best_score_.


              12345678910111213
            
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
X, y = df.drop('species', axis=1), df['species']

param_grid = {'n_neighbors': [1,3,5,7,9]}
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid)
grid_search.fit(X, y)

print(grid_search.best_estimator_)
print(grid_search.best_score_)

After fitting, GridSearchCV automatically retrains the best estimator on the full dataset. The grid_search object becomes the final trained model and can be used directly with .predict() and .score().


              12
            
grid_search.fit(X, y)
print(grid_search.score(X, y))   # training accuracy (not reliable for real evaluation)

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 6

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain what a parameter grid is and how to choose its values?

How does GridSearchCV perform cross-validation?

What does the best_estimator_ attribute represent?

Awesome!

Completion rate improved to 3.13

Swipe to show menu