Apprendre Grid Search with GridSearchCV | Manual and Search-Based Tuning Methods

Grid search is a systematic approach for hyperparameter tuning where you exhaustively evaluate every possible combination of specified hyperparameter values. This method ensures that you explore all options within your defined space, which can be especially useful when you want to avoid missing a potentially optimal configuration. In practice, grid search can quickly become tedious and computationally expensive if performed manually, especially as the number of hyperparameters and their candidate values increases. To address this, scikit-learn provides an automated tool called GridSearchCV that handles the exhaustive search and evaluation process efficiently.

Definition

Grid search is a method that evaluates all possible combinations of specified hyperparameter values. This approach ensures that every configuration in your parameter grid is considered during model tuning.

Definition

Parameter grid refers to a dictionary that specifies the hyperparameters and their candidate values for search. Each key in the dictionary is the name of a hyperparameter, and each value is a list of possible values to test during grid search.

Definition

Cross-validation is a technique for assessing model performance by splitting data into multiple train/test sets. This approach helps you obtain a more reliable estimate of how your model will perform on unseen data.

To automate grid search, use scikit-learn's GridSearchCV with a support vector classifier (SVC). Start by defining a parameter grid as a dictionary: each key is a hyperparameter name, and its value is a list of candidate values to test. GridSearchCV evaluates every combination of these values using cross-validation, automatically finding the best set of hyperparameters based on a scoring metric such as accuracy.


              1234567891011121314151617181920212223242526272829303132
            
from sklearn.datasets import make_moons
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a more challenging dataset
X, y = make_moons(n_samples=1000, noise=0.35, random_state=42)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Default Random Forest
clf_default = RandomForestClassifier(random_state=42)
clf_default.fit(X_train, y_train)
acc_default = accuracy_score(y_test, clf_default.predict(X_test))
print(f"Default (n_estimators=100, max_depth=None): {acc_default:.3f}")

# Tuned Random Forest
clf_tuned = RandomForestClassifier(
    n_estimators=300,
    max_depth=6,
    min_samples_split=3,
    min_samples_leaf=2,
    random_state=42
)
clf_tuned.fit(X_train, y_train)
acc_tuned = accuracy_score(y_test, clf_tuned.predict(X_test))
print(f"Tuned (n_estimators=300, max_depth=6): {acc_tuned:.3f}")

print(f"Improvement: +{acc_tuned - acc_default:.3f}")

By automating the search and evaluation process, GridSearchCV saves you from having to manually train and compare models for every parameter combination. This not only improves efficiency but also reduces the risk of human error in the tuning workflow.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 2

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Suggested prompts:

Can you explain how to set up a parameter grid for GridSearchCV?

What are the main advantages and disadvantages of using grid search?

How does GridSearchCV determine the best hyperparameters?

Awesome!

Completion rate improved to 9.09

Glissez pour afficher le menu

Definition


              1234567891011121314151617181920212223242526272829303132
            
from sklearn.datasets import make_moons
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a more challenging dataset
X, y = make_moons(n_samples=1000, noise=0.35, random_state=42)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Default Random Forest
clf_default = RandomForestClassifier(random_state=42)
clf_default.fit(X_train, y_train)
acc_default = accuracy_score(y_test, clf_default.predict(X_test))
print(f"Default (n_estimators=100, max_depth=None): {acc_default:.3f}")

# Tuned Random Forest
clf_tuned = RandomForestClassifier(
    n_estimators=300,
    max_depth=6,
    min_samples_split=3,
    min_samples_leaf=2,
    random_state=42
)
clf_tuned.fit(X_train, y_train)
acc_tuned = accuracy_score(y_test, clf_tuned.predict(X_test))
print(f"Tuned (n_estimators=300, max_depth=6): {acc_tuned:.3f}")

print(f"Improvement: +{acc_tuned - acc_default:.3f}")

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 2