Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Grid Search and Hyperparameter Tuning | Model Selection and Evaluation Utilities
Mastering scikit-learn API and Workflows

bookGrid Search and Hyperparameter Tuning

Grid search is a systematic approach for hyperparameter tuning in machine learning, and GridSearchCV is scikit-learn’s core utility for this task. The main purpose of GridSearchCV is to automate the process of searching over specified parameter values for an estimator, such as a classifier or regressor, to find the combination that yields the best cross-validated performance. You specify a parameter gridβ€”essentially a dictionary mapping parameter names to lists of values to tryβ€”and GridSearchCV evaluates all possible combinations using cross-validation. This ensures a thorough and unbiased search across the hyperparameter space, reducing the risk of overfitting to a single validation set.

The typical workflow involves defining your estimator or pipeline, preparing the parameter grid, and then constructing a GridSearchCV object. After fitting, you can retrieve the best parameters and estimator found during the search. This approach is especially powerful when combined with pipelines, as it allows you to tune preprocessing steps and model hyperparameters simultaneously by referencing parameters using a double-underscore (__) notation.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # Load dataset X, y = load_iris(return_X_y=True) # Define a pipeline with preprocessing and model pipe = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC()) ]) # Define parameter grid for grid search param_grid = { 'svc__C': [0.1, 1, 10], 'svc__kernel': ['linear', 'rbf'], 'svc__gamma': ['scale', 'auto'] } # Set up GridSearchCV grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1) # Fit the grid search to the data grid.fit(X, y) # Access the best parameters and score print("Best parameters:", grid.best_params_) print("Best cross-validated accuracy: {:.3f}".format(grid.best_score_))
copy

GridSearchCV integrates seamlessly into the scikit-learn workflow. You can use it wherever you would use a regular estimator: fit it to your training data, predict on new samples, and score its performance. After fitting, you can access the best set of parameters found via the best_params_ attribute, and the optimized estimator itself via best_estimator_. This makes it easy to deploy the tuned model or analyze which parameter settings performed best, supporting reproducible and robust model selection in your projects.

question mark

What is the main purpose of GridSearchCV in scikit-learn?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookGrid Search and Hyperparameter Tuning

Swipe to show menu

Grid search is a systematic approach for hyperparameter tuning in machine learning, and GridSearchCV is scikit-learn’s core utility for this task. The main purpose of GridSearchCV is to automate the process of searching over specified parameter values for an estimator, such as a classifier or regressor, to find the combination that yields the best cross-validated performance. You specify a parameter gridβ€”essentially a dictionary mapping parameter names to lists of values to tryβ€”and GridSearchCV evaluates all possible combinations using cross-validation. This ensures a thorough and unbiased search across the hyperparameter space, reducing the risk of overfitting to a single validation set.

The typical workflow involves defining your estimator or pipeline, preparing the parameter grid, and then constructing a GridSearchCV object. After fitting, you can retrieve the best parameters and estimator found during the search. This approach is especially powerful when combined with pipelines, as it allows you to tune preprocessing steps and model hyperparameters simultaneously by referencing parameters using a double-underscore (__) notation.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # Load dataset X, y = load_iris(return_X_y=True) # Define a pipeline with preprocessing and model pipe = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC()) ]) # Define parameter grid for grid search param_grid = { 'svc__C': [0.1, 1, 10], 'svc__kernel': ['linear', 'rbf'], 'svc__gamma': ['scale', 'auto'] } # Set up GridSearchCV grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1) # Fit the grid search to the data grid.fit(X, y) # Access the best parameters and score print("Best parameters:", grid.best_params_) print("Best cross-validated accuracy: {:.3f}".format(grid.best_score_))
copy

GridSearchCV integrates seamlessly into the scikit-learn workflow. You can use it wherever you would use a regular estimator: fit it to your training data, predict on new samples, and score its performance. After fitting, you can access the best set of parameters found via the best_params_ attribute, and the optimized estimator itself via best_estimator_. This makes it easy to deploy the tuned model or analyze which parameter settings performed best, supporting reproducible and robust model selection in your projects.

question mark

What is the main purpose of GridSearchCV in scikit-learn?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2
some-alt