Grid Search and Hyperparameter Tuning
Grid search is a systematic approach for hyperparameter tuning in machine learning, and GridSearchCV is scikit-learn’s core utility for this task. The main purpose of GridSearchCV is to automate the process of searching over specified parameter values for an estimator, such as a classifier or regressor, to find the combination that yields the best cross-validated performance. You specify a parameter grid—essentially a dictionary mapping parameter names to lists of values to try—and GridSearchCV evaluates all possible combinations using cross-validation. This ensures a thorough and unbiased search across the hyperparameter space, reducing the risk of overfitting to a single validation set.
The typical workflow involves defining your estimator or pipeline, preparing the parameter grid, and then constructing a GridSearchCV object. After fitting, you can retrieve the best parameters and estimator found during the search. This approach is especially powerful when combined with pipelines, as it allows you to tune preprocessing steps and model hyperparameters simultaneously by referencing parameters using a double-underscore (__) notation.
1234567891011121314151617181920212223242526272829303132import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # Load dataset X, y = load_iris(return_X_y=True) # Define a pipeline with preprocessing and model pipe = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC()) ]) # Define parameter grid for grid search param_grid = { 'svc__C': [0.1, 1, 10], 'svc__kernel': ['linear', 'rbf'], 'svc__gamma': ['scale', 'auto'] } # Set up GridSearchCV grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1) # Fit the grid search to the data grid.fit(X, y) # Access the best parameters and score print("Best parameters:", grid.best_params_) print("Best cross-validated accuracy: {:.3f}".format(grid.best_score_))
GridSearchCV integrates seamlessly into the scikit-learn workflow. You can use it wherever you would use a regular estimator: fit it to your training data, predict on new samples, and score its performance. After fitting, you can access the best set of parameters found via the best_params_ attribute, and the optimized estimator itself via best_estimator_. This makes it easy to deploy the tuned model or analyze which parameter settings performed best, supporting reproducible and robust model selection in your projects.
Kiitos palautteestasi!
Kysy tekoälyä
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme
Can you explain how the parameter grid works in this example?
What does the double-underscore (`__`) notation mean in the parameter grid?
How can I interpret the output of the grid search?
Mahtavaa!
Completion arvosana parantunut arvoon 5.26
Grid Search and Hyperparameter Tuning
Pyyhkäise näyttääksesi valikon
Grid search is a systematic approach for hyperparameter tuning in machine learning, and GridSearchCV is scikit-learn’s core utility for this task. The main purpose of GridSearchCV is to automate the process of searching over specified parameter values for an estimator, such as a classifier or regressor, to find the combination that yields the best cross-validated performance. You specify a parameter grid—essentially a dictionary mapping parameter names to lists of values to try—and GridSearchCV evaluates all possible combinations using cross-validation. This ensures a thorough and unbiased search across the hyperparameter space, reducing the risk of overfitting to a single validation set.
The typical workflow involves defining your estimator or pipeline, preparing the parameter grid, and then constructing a GridSearchCV object. After fitting, you can retrieve the best parameters and estimator found during the search. This approach is especially powerful when combined with pipelines, as it allows you to tune preprocessing steps and model hyperparameters simultaneously by referencing parameters using a double-underscore (__) notation.
1234567891011121314151617181920212223242526272829303132import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # Load dataset X, y = load_iris(return_X_y=True) # Define a pipeline with preprocessing and model pipe = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC()) ]) # Define parameter grid for grid search param_grid = { 'svc__C': [0.1, 1, 10], 'svc__kernel': ['linear', 'rbf'], 'svc__gamma': ['scale', 'auto'] } # Set up GridSearchCV grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1) # Fit the grid search to the data grid.fit(X, y) # Access the best parameters and score print("Best parameters:", grid.best_params_) print("Best cross-validated accuracy: {:.3f}".format(grid.best_score_))
GridSearchCV integrates seamlessly into the scikit-learn workflow. You can use it wherever you would use a regular estimator: fit it to your training data, predict on new samples, and score its performance. After fitting, you can access the best set of parameters found via the best_params_ attribute, and the optimized estimator itself via best_estimator_. This makes it easy to deploy the tuned model or analyze which parameter settings performed best, supporting reproducible and robust model selection in your projects.
Kiitos palautteestasi!