Bagging Classifier | Commonly Used Bagging Models
Ensemble Learning

# Bagging Classifier

We have already considered the principle of the work of bagging ensemble. Now let's apply this knowledge and create a model that will provide classification using such an ensemble in Python:

Firstly, we import `BaggingClassifier ` class that contains all the necessary tools to work with the bagging classifier.
Then, we create an instance of this class specifying the base model, the number of these models to create an ensemble, and `n_jobs` parameter.

Note

We have already mentioned that bagging models can be fitted using parallel computing. `n_jobs=-1` means that we will use all available processors to train the model

Now we can use the `.fit()` and `.predict()` methods of `BaggingClassifier` to fit the model on available data and make predictions:

Now, let's talk about the base models of an ensemble.

## What models can be used as base?

We can use absolutely any models designed to perform classification tasks as the base models of the ensemble (logistic regression, SVM, neural networks, etc.).

It is also important to note that when using the `.fit()` method, the ensemble will learn on different data subsamples by itself, so we don't need to specify additional parameters or manually control the learning process.

When we use the `.predict()` method, the hard voting technique creates a final prediction.
If we want to create a final prediction using soft voting, we must use the base estimator with the implemented `.predict_proba()` method instead of the `.predict()` method. As a result, we will get the vector corresponding to each sample from the test set containing aggregated probabilities belonging to a particular class (we will call it the probability matrix).

Note

If we don't specify the base model of `BaggingClassifer`, the Decision Tree Classifier will be used by default.

## Example of usage

Let's solve some simple classification problems using a bagging ensemble with logistic regression as a base model:

Code Description
In the provided code, we performed the following steps:

• Generate Synthetic Data:
• We used the `make_classification()` function from `sklearn.datasets` module to create a synthetic dataset for a binary classification problem. The dataset has 1000 samples, 10 features, 5 informative features, and 1 cluster per class. This synthetic dataset allows us to create a simple classification problem for demonstration purposes.
• Split Data into Training and Testing Sets:
• We split the synthetic data into training and testing sets using `train_test_split()` function from `sklearn.model_selection` module. The training set will be used to train the Bagging ensemble, and the testing set will be used to evaluate its performance.
• Create Base Model:
• We created a base model using `LogisticRegression` class from `sklearn.linear_model` module. The base model is a Logistic Regression classifier, which will be used as the weak learner within the Bagging ensemble.
• Create Bagging Classifier:
• We created a `BaggingClassifier` class with the base model (Logistic Regression) as the weak learner. The `BaggingClassifier` will train multiple instances of the base model on different subsets of the training data.
• Train the Bagging Classifier:
• We trained the Bagging ensemble on the training data using the `.fit()` method. The `BaggingClassifier` sequentially trains multiple instances of the base model on different subsets of the training data to create the ensemble.
• Make Predictions and Calculate F1 Score:
• After training the Bagging ensemble, we used it to make predictions on the test data with the `.predict()` method. We then calculated the F1 score using `f1_score()` function from `sklearn.metrics` module.
You can find the official documentation with all the necessary information about implementing this model in Python on the official website. Go here if needed.

How can we use soft voting in `BaggingClassifier`?

Everything was clear?

Section 2. Chapter 1

Course Content

Ensemble Learning

# Bagging Classifier

We have already considered the principle of the work of bagging ensemble. Now let's apply this knowledge and create a model that will provide classification using such an ensemble in Python:

Firstly, we import `BaggingClassifier ` class that contains all the necessary tools to work with the bagging classifier.
Then, we create an instance of this class specifying the base model, the number of these models to create an ensemble, and `n_jobs` parameter.

Note

We have already mentioned that bagging models can be fitted using parallel computing. `n_jobs=-1` means that we will use all available processors to train the model

Now we can use the `.fit()` and `.predict()` methods of `BaggingClassifier` to fit the model on available data and make predictions:

Now, let's talk about the base models of an ensemble.

## What models can be used as base?

We can use absolutely any models designed to perform classification tasks as the base models of the ensemble (logistic regression, SVM, neural networks, etc.).

It is also important to note that when using the `.fit()` method, the ensemble will learn on different data subsamples by itself, so we don't need to specify additional parameters or manually control the learning process.

When we use the `.predict()` method, the hard voting technique creates a final prediction.
If we want to create a final prediction using soft voting, we must use the base estimator with the implemented `.predict_proba()` method instead of the `.predict()` method. As a result, we will get the vector corresponding to each sample from the test set containing aggregated probabilities belonging to a particular class (we will call it the probability matrix).

Note

If we don't specify the base model of `BaggingClassifer`, the Decision Tree Classifier will be used by default.

## Example of usage

Let's solve some simple classification problems using a bagging ensemble with logistic regression as a base model:

Code Description
In the provided code, we performed the following steps:

• Generate Synthetic Data:
• We used the `make_classification()` function from `sklearn.datasets` module to create a synthetic dataset for a binary classification problem. The dataset has 1000 samples, 10 features, 5 informative features, and 1 cluster per class. This synthetic dataset allows us to create a simple classification problem for demonstration purposes.
• Split Data into Training and Testing Sets:
• We split the synthetic data into training and testing sets using `train_test_split()` function from `sklearn.model_selection` module. The training set will be used to train the Bagging ensemble, and the testing set will be used to evaluate its performance.
• Create Base Model:
• We created a base model using `LogisticRegression` class from `sklearn.linear_model` module. The base model is a Logistic Regression classifier, which will be used as the weak learner within the Bagging ensemble.
• Create Bagging Classifier:
• We created a `BaggingClassifier` class with the base model (Logistic Regression) as the weak learner. The `BaggingClassifier` will train multiple instances of the base model on different subsets of the training data.
• Train the Bagging Classifier:
• We trained the Bagging ensemble on the training data using the `.fit()` method. The `BaggingClassifier` sequentially trains multiple instances of the base model on different subsets of the training data to create the ensemble.
• Make Predictions and Calculate F1 Score:
• After training the Bagging ensemble, we used it to make predictions on the test data with the `.predict()` method. We then calculated the F1 score using `f1_score()` function from `sklearn.metrics` module.
You can find the official documentation with all the necessary information about implementing this model in Python on the official website. Go here if needed.

How can we use soft voting in `BaggingClassifier`?