Oppiskele Grid Search and Hyperparameter Tuning | Model Selection and Evaluation Utilities

Pyyhkäise näyttääksesi valikon

Grid search is a systematic approach for hyperparameter tuning in machine learning, and GridSearchCV is scikit-learn’s core utility for this task. The main purpose of GridSearchCV is to automate the process of searching over specified parameter values for an estimator, such as a classifier or regressor, to find the combination that yields the best cross-validated performance. You specify a parameter grid—essentially a dictionary mapping parameter names to lists of values to try—and GridSearchCV evaluates all possible combinations using cross-validation. This ensures a thorough and unbiased search across the hyperparameter space, reducing the risk of overfitting to a single validation set.

The typical workflow involves defining your estimator or pipeline, preparing the parameter grid, and then constructing a GridSearchCV object. After fitting, you can retrieve the best parameters and estimator found during the search. This approach is especially powerful when combined with pipelines, as it allows you to tune preprocessing steps and model hyperparameters simultaneously by referencing parameters using a double-underscore (__) notation.


              1234567891011121314151617181920212223242526272829303132
            
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

# Load dataset
X, y = load_iris(return_X_y=True)

# Define a pipeline with preprocessing and model
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('svc', SVC())
])

# Define parameter grid for grid search
param_grid = {
    'svc__C': [0.1, 1, 10],
    'svc__kernel': ['linear', 'rbf'],
    'svc__gamma': ['scale', 'auto']
}

# Set up GridSearchCV
grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1)

# Fit the grid search to the data
grid.fit(X, y)

# Access the best parameters and score
print("Best parameters:", grid.best_params_)
print("Best cross-validated accuracy: {:.3f}".format(grid.best_score_))

GridSearchCV integrates seamlessly into the scikit-learn workflow. You can use it wherever you would use a regular estimator: fit it to your training data, predict on new samples, and score its performance. After fitting, you can access the best set of parameters found via the best_params_ attribute, and the optimized estimator itself via best_estimator_. This makes it easy to deploy the tuned model or analyze which parameter settings performed best, supporting reproducible and robust model selection in your projects.

Oliko kaikki selvää?

Kiitos palautteestasi!

Osio 4. Luku 2

Kysy tekoälyä

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 4. Luku 2