Lære Bayesian Optimization with BayesSearchCV | Advanced and Bayesian Tuning Techniques

BayesSearchCV from the skopt library brings Bayesian optimization to hyperparameter tuning, using the same interface as GridSearchCV in scikit-learn. This means you can apply advanced optimization techniques without changing how you set up your search.

Comparing BayesSearchCV, GridSearchCV, and RandomizedSearchCV

When tuning hyperparameters, you can choose from several search strategies:

GridSearchCV: tries every possible combination of hyperparameter values in a predefined grid; can be very slow and inefficient, especially with many parameters;
RandomizedSearchCV: samples a fixed number of random combinations from the parameter space; faster than grid search, but may miss optimal regions if not enough samples are taken;
BayesSearchCV: uses Bayesian optimization to intelligently select the next set of hyperparameters based on previous results; explores promising areas first, making it much more efficient and requiring fewer evaluations to find good solutions.

Bayesian optimization, as used in BayesSearchCV, is especially useful when you have limited computational resources or a large search space.


              123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
            
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
from skopt import BayesSearchCV
import numpy as np
import random, time

# Set random seeds for reproducibility
random.seed(42)
np.random.seed(42)

# Generate nonlinear, noisy dataset
X, y = make_moons(n_samples=1000, noise=0.35, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# --- Grid Search (exhaustive) ---
param_grid = {
    "n_estimators": [50, 100, 200, 300],
    "max_depth": [3, 5, 10, None]
}

start_time = time.time()
grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=5,
    n_jobs=-1
)
grid_search.fit(X_train, y_train)
grid_time = time.time() - start_time
grid_acc = accuracy_score(y_test, grid_search.predict(X_test))

# --- Bayes Search (efficient) ---
search_spaces = {
    "n_estimators": (50, 300),
    "max_depth": (3, 10)
}

start_time = time.time()
bayes_search = BayesSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    search_spaces=search_spaces,
    n_iter=16,               # number of iterations (evaluations)
    cv=5,
    n_jobs=-1,
    random_state=42
)
bayes_search.fit(X_train, y_train)
bayes_time = time.time() - start_time
bayes_acc = accuracy_score(y_test, bayes_search.predict(X_test))

# --- Results ---
print(f"BayesSearchCV accuracy: {accuracy_score(y_test, grid_pred):.3f}")
print(f"BayesSearchCV iterations: {len(bayes_search.cv_results_['params'])}")

Note

Bayesian search is especially effective for large, continuous hyperparameter spaces. By modeling the relationship between hyperparameters and performance, it efficiently explores promising regions of the search space, often requiring fewer iterations than random or grid search to find optimal values.

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 2

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Can you explain the main differences in results between GridSearchCV and BayesSearchCV?

How does BayesSearchCV choose hyperparameters differently from GridSearchCV?

Can you clarify why both methods achieved the same accuracy in this example?

Awesome!

Completion rate improved to 9.09

Sveip for å vise menyen