Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Blending & Hybrid Models | Practical Usage & Comparison
Advanced Tree-Based Models

bookBlending & Hybrid Models

Blending and stacking are two ensemble techniques that can further boost the predictive power of tree-based models such as CatBoost, XGBoost, and LightGBM. In blending, you combine the predictions of several different models—often trained with different algorithms or hyperparameters—by averaging or weighting their outputs. This approach leverages the strengths of each individual model and can help reduce overfitting by smoothing out their unique errors.

Simple stacking takes this idea a step further by training a new model (often called a meta-learner) on the outputs of the base models. This meta-learner tries to learn the best way to combine the base predictions. While stacking can be more powerful than blending, it is also more complex to set up and tune.

For tree-based models, blending is particularly attractive because CatBoost, LightGBM, and XGBoost each have unique strengths and may capture different aspects of the data. By blending their predictions, you can often achieve more robust and accurate results, especially on challenging datasets or in competitions.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from catboost import CatBoostClassifier from lightgbm import LGBMClassifier # Generate synthetic binary classification data X, y = make_classification(n_samples=2000, n_features=20, n_informative=15, random_state=42) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42) # Train CatBoost catboost_model = CatBoostClassifier(verbose=0, random_seed=42) catboost_model.fit(X_train, y_train) catboost_pred = catboost_model.predict_proba(X_val)[:, 1] # Train LightGBM lgbm_model = LGBMClassifier(random_state=42) lgbm_model.fit(X_train, y_train) lgbm_pred = lgbm_model.predict_proba(X_val)[:, 1] # Simple blending: average the predictions blend_pred = (catboost_pred + lgbm_pred) / 2 # Compute AUC for individual and blended predictions catboost_auc = roc_auc_score(y_val, catboost_pred) lgbm_auc = roc_auc_score(y_val, lgbm_pred) blend_auc = roc_auc_score(y_val, blend_pred) print("CatBoost AUC:", catboost_auc) print("LightGBM AUC:", lgbm_auc) print("Blended AUC:", blend_auc)
copy
Note
Note

Blending and stacking are most beneficial when your base models are diverse and make different types of errors. This diversity can come from using different algorithms, hyperparameters, or even training on different data subsets. However, blending or stacking similar models can sometimes provide little to no improvement.

Overfitting is a potential pitfall, especially if you blend on the same data used to train your base models or if your meta-learner is too complex. Always evaluate ensemble approaches on a separate validation set to ensure genuine improvement.

Aufgabe

Swipe to start coding

You are given a binary classification dataset. Your goal is to:

  1. Train three different gradient boosting models:
    • CatBoostClassifier;
    • XGBClassifier;
    • LGBMClassifier.
  2. Predict probabilities on the test set.
  3. Blend all three models using simple averaging of probabilities.
  4. Compute accuracy and store it in accuracy_value.
  5. Print dataset shapes, model types, and blended accuracy.

Use CatBoost, XGBoost, LightGBM, and only sklearn-compatible APIs. No tuning, no loops besides blending logic.

Lösung

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 2
single

single

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

close

Awesome!

Completion rate improved to 11.11

bookBlending & Hybrid Models

Swipe um das Menü anzuzeigen

Blending and stacking are two ensemble techniques that can further boost the predictive power of tree-based models such as CatBoost, XGBoost, and LightGBM. In blending, you combine the predictions of several different models—often trained with different algorithms or hyperparameters—by averaging or weighting their outputs. This approach leverages the strengths of each individual model and can help reduce overfitting by smoothing out their unique errors.

Simple stacking takes this idea a step further by training a new model (often called a meta-learner) on the outputs of the base models. This meta-learner tries to learn the best way to combine the base predictions. While stacking can be more powerful than blending, it is also more complex to set up and tune.

For tree-based models, blending is particularly attractive because CatBoost, LightGBM, and XGBoost each have unique strengths and may capture different aspects of the data. By blending their predictions, you can often achieve more robust and accurate results, especially on challenging datasets or in competitions.

1234567891011121314151617181920212223242526272829303132
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from catboost import CatBoostClassifier from lightgbm import LGBMClassifier # Generate synthetic binary classification data X, y = make_classification(n_samples=2000, n_features=20, n_informative=15, random_state=42) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42) # Train CatBoost catboost_model = CatBoostClassifier(verbose=0, random_seed=42) catboost_model.fit(X_train, y_train) catboost_pred = catboost_model.predict_proba(X_val)[:, 1] # Train LightGBM lgbm_model = LGBMClassifier(random_state=42) lgbm_model.fit(X_train, y_train) lgbm_pred = lgbm_model.predict_proba(X_val)[:, 1] # Simple blending: average the predictions blend_pred = (catboost_pred + lgbm_pred) / 2 # Compute AUC for individual and blended predictions catboost_auc = roc_auc_score(y_val, catboost_pred) lgbm_auc = roc_auc_score(y_val, lgbm_pred) blend_auc = roc_auc_score(y_val, blend_pred) print("CatBoost AUC:", catboost_auc) print("LightGBM AUC:", lgbm_auc) print("Blended AUC:", blend_auc)
copy
Note
Note

Blending and stacking are most beneficial when your base models are diverse and make different types of errors. This diversity can come from using different algorithms, hyperparameters, or even training on different data subsets. However, blending or stacking similar models can sometimes provide little to no improvement.

Overfitting is a potential pitfall, especially if you blend on the same data used to train your base models or if your meta-learner is too complex. Always evaluate ensemble approaches on a separate validation set to ensure genuine improvement.

Aufgabe

Swipe to start coding

You are given a binary classification dataset. Your goal is to:

  1. Train three different gradient boosting models:
    • CatBoostClassifier;
    • XGBClassifier;
    • LGBMClassifier.
  2. Predict probabilities on the test set.
  3. Blend all three models using simple averaging of probabilities.
  4. Compute accuracy and store it in accuracy_value.
  5. Print dataset shapes, model types, and blended accuracy.

Use CatBoost, XGBoost, LightGBM, and only sklearn-compatible APIs. No tuning, no loops besides blending logic.

Lösung

Switch to desktopWechseln Sie zum Desktop, um in der realen Welt zu übenFahren Sie dort fort, wo Sie sind, indem Sie eine der folgenden Optionen verwenden
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 2
single

single

some-alt