Course Content

Ensemble Learning

1. Basic Principles of Building Ensemble Models

What is Ensemble of Models?Bagging Models Boosting Models Stacking Models

2. Commonly Used Bagging Models

Bagging Classifier Challenge: Solving Task Using Bagging Classifier Bagging Regressor Challenge: Solving Task Using Bagging Regressor Random Forest Challenge: Determining Feature Importances Using Random Forest ExtraTrees

3. Commonly Used Boosting Models

AdaBoost Classifier Challenge: Solving Task Using AdaBoost Classifier Challenge: Solving Task Using AdaBoost Regressor Gradient Boosting XGBoost Challenge: Solving Task Using XGBoost

4. Commonly Used Stacking Models

Stacking Classifier Challenge: Solving Task Using Stacking Classifier Challenge: Solving Task Using Stacking Regressor Using Ensembles As Base Models Course Summary

Bagging Classifier

We have already considered the principle of the work of bagging ensemble. Now let's apply this knowledge and create a model that will provide classification using such an ensemble in Python:

Firstly, we import BaggingClassifier class that contains all the necessary tools to work with the bagging classifier.
Then, we create an instance of this class specifying the base model, the number of these models to create an ensemble, and n_jobs parameter.

Note

We have already mentioned that bagging models can be fitted using parallel computing. n_jobs=-1 means that we will use all available processors to train the model

Now we can use the .fit() and .predict() methods of BaggingClassifier to fit the model on available data and make predictions:

Now, let's talk about the base models of an ensemble.

What models can be used as base?

We can use absolutely any models designed to perform classification tasks as the base models of the ensemble (logistic regression, SVM, neural networks, etc.).

It is also important to note that when using the .fit() method, the ensemble will learn on different data subsamples by itself, so we don't need to specify additional parameters or manually control the learning process.

When we use the .predict() method, the hard voting technique creates a final prediction.
If we want to create a final prediction using soft voting, we must use the base estimator with the implemented .predict_proba() method instead of the .predict() method. As a result, we will get the vector corresponding to each sample from the test set containing aggregated probabilities belonging to a particular class (we will call it the probability matrix).

Note

If we don't specify the base model of BaggingClassifer, the Decision Tree Classifier will be used by default.

Example of usage

Let's solve some simple classification problems using a bagging ensemble with logistic regression as a base model:


              12345678910111213141516171819202122232425262728
            
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_clusters_per_class=1, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a base model (Logistic Regression)
base_model = LogisticRegression(random_state=42)

# Create a Bagging Classifier
bagging_model = BaggingClassifier(base_model, n_estimators=10, n_jobs=-1)

# Train the `BaggingClassifier`
bagging_model.fit(X_train, y_train)

# Make predictions on the test data
predictions = bagging_model.predict(X_test)

# Calculate F1 score
f1 = f1_score(y_test, predictions)
print(f'F1 score: {f1:.4f}')

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 1