Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Blending & Hybrid Models | Practical Usage & Comparison
Advanced Tree-Based Models

bookBlending & Hybrid Models

Blending and stacking are two ensemble techniques that can further boost the predictive power of tree-based models such as CatBoost, XGBoost, and LightGBM. In blending, you combine the predictions of several different models—often trained with different algorithms or hyperparameters—by averaging or weighting their outputs. This approach leverages the strengths of each individual model and can help reduce overfitting by smoothing out their unique errors.

Simple stacking takes this idea a step further by training a new model (often called a meta-learner) on the outputs of the base models. This meta-learner tries to learn the best way to combine the base predictions. While stacking can be more powerful than blending, it is also more complex to set up and tune.

For tree-based models, blending is particularly attractive because CatBoost, LightGBM, and XGBoost each have unique strengths and may capture different aspects of the data. By blending their predictions, you can often achieve more robust and accurate results, especially on challenging datasets or in competitions.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from catboost import CatBoostClassifier from lightgbm import LGBMClassifier # Generate synthetic binary classification data X, y = make_classification(n_samples=2000, n_features=20, n_informative=15, random_state=42) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42) # Train CatBoost catboost_model = CatBoostClassifier(verbose=0, random_seed=42) catboost_model.fit(X_train, y_train) catboost_pred = catboost_model.predict_proba(X_val)[:, 1] # Train LightGBM lgbm_model = LGBMClassifier(random_state=42) lgbm_model.fit(X_train, y_train) lgbm_pred = lgbm_model.predict_proba(X_val)[:, 1] # Simple blending: average the predictions blend_pred = (catboost_pred + lgbm_pred) / 2 # Compute AUC for individual and blended predictions catboost_auc = roc_auc_score(y_val, catboost_pred) lgbm_auc = roc_auc_score(y_val, lgbm_pred) blend_auc = roc_auc_score(y_val, blend_pred) print("CatBoost AUC:", catboost_auc) print("LightGBM AUC:", lgbm_auc) print("Blended AUC:", blend_auc)
copy
Note
Note

Blending and stacking are most beneficial when your base models are diverse and make different types of errors. This diversity can come from using different algorithms, hyperparameters, or even training on different data subsets. However, blending or stacking similar models can sometimes provide little to no improvement.

Overfitting is a potential pitfall, especially if you blend on the same data used to train your base models or if your meta-learner is too complex. Always evaluate ensemble approaches on a separate validation set to ensure genuine improvement.

Завдання

Swipe to start coding

You are given a binary classification dataset. Your goal is to:

  1. Train three different gradient boosting models:
    • CatBoostClassifier;
    • XGBClassifier;
    • LGBMClassifier.
  2. Predict probabilities on the test set.
  3. Blend all three models using simple averaging of probabilities.
  4. Compute accuracy and store it in accuracy_value.
  5. Print dataset shapes, model types, and blended accuracy.

Use CatBoost, XGBoost, LightGBM, and only sklearn-compatible APIs. No tuning, no loops besides blending logic.

Рішення

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 2
single

single

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

close

Awesome!

Completion rate improved to 11.11

bookBlending & Hybrid Models

Свайпніть щоб показати меню

Blending and stacking are two ensemble techniques that can further boost the predictive power of tree-based models such as CatBoost, XGBoost, and LightGBM. In blending, you combine the predictions of several different models—often trained with different algorithms or hyperparameters—by averaging or weighting their outputs. This approach leverages the strengths of each individual model and can help reduce overfitting by smoothing out their unique errors.

Simple stacking takes this idea a step further by training a new model (often called a meta-learner) on the outputs of the base models. This meta-learner tries to learn the best way to combine the base predictions. While stacking can be more powerful than blending, it is also more complex to set up and tune.

For tree-based models, blending is particularly attractive because CatBoost, LightGBM, and XGBoost each have unique strengths and may capture different aspects of the data. By blending their predictions, you can often achieve more robust and accurate results, especially on challenging datasets or in competitions.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from catboost import CatBoostClassifier from lightgbm import LGBMClassifier # Generate synthetic binary classification data X, y = make_classification(n_samples=2000, n_features=20, n_informative=15, random_state=42) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42) # Train CatBoost catboost_model = CatBoostClassifier(verbose=0, random_seed=42) catboost_model.fit(X_train, y_train) catboost_pred = catboost_model.predict_proba(X_val)[:, 1] # Train LightGBM lgbm_model = LGBMClassifier(random_state=42) lgbm_model.fit(X_train, y_train) lgbm_pred = lgbm_model.predict_proba(X_val)[:, 1] # Simple blending: average the predictions blend_pred = (catboost_pred + lgbm_pred) / 2 # Compute AUC for individual and blended predictions catboost_auc = roc_auc_score(y_val, catboost_pred) lgbm_auc = roc_auc_score(y_val, lgbm_pred) blend_auc = roc_auc_score(y_val, blend_pred) print("CatBoost AUC:", catboost_auc) print("LightGBM AUC:", lgbm_auc) print("Blended AUC:", blend_auc)
copy
Note
Note

Blending and stacking are most beneficial when your base models are diverse and make different types of errors. This diversity can come from using different algorithms, hyperparameters, or even training on different data subsets. However, blending or stacking similar models can sometimes provide little to no improvement.

Overfitting is a potential pitfall, especially if you blend on the same data used to train your base models or if your meta-learner is too complex. Always evaluate ensemble approaches on a separate validation set to ensure genuine improvement.

Завдання

Swipe to start coding

You are given a binary classification dataset. Your goal is to:

  1. Train three different gradient boosting models:
    • CatBoostClassifier;
    • XGBClassifier;
    • LGBMClassifier.
  2. Predict probabilities on the test set.
  3. Blend all three models using simple averaging of probabilities.
  4. Compute accuracy and store it in accuracy_value.
  5. Print dataset shapes, model types, and blended accuracy.

Use CatBoost, XGBoost, LightGBM, and only sklearn-compatible APIs. No tuning, no loops besides blending logic.

Рішення

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 2
single

single

some-alt