Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Grid Search and Hyperparameter Tuning | Model Selection and Evaluation Utilities
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Mastering scikit-learn API and Workflows

bookGrid Search and Hyperparameter Tuning

Grid search is a systematic approach for hyperparameter tuning in machine learning, and GridSearchCV is scikit-learn’s core utility for this task. The main purpose of GridSearchCV is to automate the process of searching over specified parameter values for an estimator, such as a classifier or regressor, to find the combination that yields the best cross-validated performance. You specify a parameter grid—essentially a dictionary mapping parameter names to lists of values to try—and GridSearchCV evaluates all possible combinations using cross-validation. This ensures a thorough and unbiased search across the hyperparameter space, reducing the risk of overfitting to a single validation set.

The typical workflow involves defining your estimator or pipeline, preparing the parameter grid, and then constructing a GridSearchCV object. After fitting, you can retrieve the best parameters and estimator found during the search. This approach is especially powerful when combined with pipelines, as it allows you to tune preprocessing steps and model hyperparameters simultaneously by referencing parameters using a double-underscore (__) notation.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # Load dataset X, y = load_iris(return_X_y=True) # Define a pipeline with preprocessing and model pipe = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC()) ]) # Define parameter grid for grid search param_grid = { 'svc__C': [0.1, 1, 10], 'svc__kernel': ['linear', 'rbf'], 'svc__gamma': ['scale', 'auto'] } # Set up GridSearchCV grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1) # Fit the grid search to the data grid.fit(X, y) # Access the best parameters and score print("Best parameters:", grid.best_params_) print("Best cross-validated accuracy: {:.3f}".format(grid.best_score_))
copy

GridSearchCV integrates seamlessly into the scikit-learn workflow. You can use it wherever you would use a regular estimator: fit it to your training data, predict on new samples, and score its performance. After fitting, you can access the best set of parameters found via the best_params_ attribute, and the optimized estimator itself via best_estimator_. This makes it easy to deploy the tuned model or analyze which parameter settings performed best, supporting reproducible and robust model selection in your projects.

question mark

What is the main purpose of GridSearchCV in scikit-learn?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 2

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain how the parameter grid works in this example?

What does the double-underscore (`__`) notation mean in the parameter grid?

How can I interpret the output of the grid search?

bookGrid Search and Hyperparameter Tuning

Pyyhkäise näyttääksesi valikon

Grid search is a systematic approach for hyperparameter tuning in machine learning, and GridSearchCV is scikit-learn’s core utility for this task. The main purpose of GridSearchCV is to automate the process of searching over specified parameter values for an estimator, such as a classifier or regressor, to find the combination that yields the best cross-validated performance. You specify a parameter grid—essentially a dictionary mapping parameter names to lists of values to try—and GridSearchCV evaluates all possible combinations using cross-validation. This ensures a thorough and unbiased search across the hyperparameter space, reducing the risk of overfitting to a single validation set.

The typical workflow involves defining your estimator or pipeline, preparing the parameter grid, and then constructing a GridSearchCV object. After fitting, you can retrieve the best parameters and estimator found during the search. This approach is especially powerful when combined with pipelines, as it allows you to tune preprocessing steps and model hyperparameters simultaneously by referencing parameters using a double-underscore (__) notation.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # Load dataset X, y = load_iris(return_X_y=True) # Define a pipeline with preprocessing and model pipe = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC()) ]) # Define parameter grid for grid search param_grid = { 'svc__C': [0.1, 1, 10], 'svc__kernel': ['linear', 'rbf'], 'svc__gamma': ['scale', 'auto'] } # Set up GridSearchCV grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1) # Fit the grid search to the data grid.fit(X, y) # Access the best parameters and score print("Best parameters:", grid.best_params_) print("Best cross-validated accuracy: {:.3f}".format(grid.best_score_))
copy

GridSearchCV integrates seamlessly into the scikit-learn workflow. You can use it wherever you would use a regular estimator: fit it to your training data, predict on new samples, and score its performance. After fitting, you can access the best set of parameters found via the best_params_ attribute, and the optimized estimator itself via best_estimator_. This makes it easy to deploy the tuned model or analyze which parameter settings performed best, supporting reproducible and robust model selection in your projects.

question mark

What is the main purpose of GridSearchCV in scikit-learn?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 2
some-alt