Course Content

Ensemble Learning

1. Basic Principles of Building Ensemble Models

What is Ensemble of Models?Bagging Models Boosting Models Stacking Models

2. Commonly Used Bagging Models

Bagging Classifier Challenge: Solving Task Using Bagging Classifier Bagging Regressor Challenge: Solving Task Using Bagging Regressor Random Forest Challenge: Determining Feature Importances Using Random Forest ExtraTrees

3. Commonly Used Boosting Models

AdaBoost Classifier Challenge: Solving Task Using AdaBoost Classifier Challenge: Solving Task Using AdaBoost Regressor Gradient Boosting XGBoost Challenge: Solving Task Using XGBoost

4. Commonly Used Stacking Models

Stacking Classifier Challenge: Solving Task Using Stacking Classifier Challenge: Solving Task Using Stacking Regressor Using Ensembles As Base Models Course Summary

Bagging Regressor

Bagging Regressor creates an ensemble of multiple base regression models and combines their predictions to produce a final prediction. In Bagging Regressor, the base model is typically a regression algorithm, such as Decision Tree Regressor. The main idea behind Bagging Regressor is to reduce overfitting and improve the stability and accuracy of the predictions by averaging the predictions of multiple base models.

How does Bagging Regressor work?

Bootstrap Sampling: The Bagging Regressor generates multiple subsets of the training data by randomly selecting samples with replacements. Each subset is called a bootstrap sample;
Base Model Training: A separate base regression model (e.g., Decision Tree Regressor) is trained on each bootstrap sample. This creates multiple base models, each with its own variation due to the different subsets of data they were trained on;
Aggregation of Predictions: The Bagging Regressor aggregates the predictions from all base models to make predictions. In the case of regression tasks, the predictions are typically averaged across the base models to form the final prediction. This ensemble approach helps to reduce overfitting and improve the overall model performance.

Note

Selecting samples with replacement is a concept often used in statistics and probability. It refers to a method of sampling data points or elements from a dataset or population where, after each selection, the selected item is put back into the dataset before the next selection. In other words, the same item can be chosen more than once in the sampling process.

Example of usage

The principle of using a Bagging Regressor in Python is the same as using a Bagging Classifier. The only difference is that Bagging Regressor has no implementation for the .predict_proba() method - the .predict() method is used to create predictions instead.

Note

If we don't specify the base model of AdaBoostRegressor, the DecisionTreeRegressor will be used by default.


              123456789101112131415161718192021222324252627282930
            
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a base model (Decision Tree Regressor)
base_model = DecisionTreeRegressor(random_state=42)

# Create the Bagging Regressor
bagging_model = BaggingRegressor(base_model, n_estimators=10)

# Train the Bagging Regressor
bagging_model.fit(X_train, y_train)

# Make predictions on the test data
predictions = bagging_model.predict(X_test)

# Calculate mean squared error (MSE)
mse = mean_squared_error(y_test, predictions)
print(f'Mean Squared Error: {mse:.4f}')

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat