Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Bagging Models | Basic Principles of Building Ensemble Models
Ensemble Learning

Bagging Models

Bagging (bootstrap aggregating) model - an ensemble learning model that consists of identical base models and aggregates their results using voting. Voting, in this case, means that the ensemble will give the result value that the majority of weak learners vote for it. In more detail, we will discuss two types of voting, soft voting and hard voting.

Hard Voting

Suppose we solve a binary classification problem using some number of weak learners ( for example, logistic regression or SVM). In this scenario, we will consider the final prediction to be the class that receives the majority of votes from the weak learners.

Why is this approach better than using a single model?

  1. Firstly, let's consider a scenario where one simple model that produces the correct result 51 percent of the time. Such a result is only slightly better than random guessing.
  2. Let's calculate the probability of getting the correct result using an ensemble of 1000 of these weak models. If we use hard voting, the probability of getting the correct result is equivalent to the probability that more than 500 models will give the correct result.

Assuming that the results of all weak learners are independent, we can use the Central Limit Theorem to calculate the probability of getting the right answer (you can get familiar with the Central Limit Theorem in these chapters: Chapter 1 , Chapter 2 :

ξi - the result of binary classificator i. This result is 1 with a probability 0.51 and is 0 with a probability 0.49. Using the CLT, we can calculate the probability of getting more than 500 right results among 1000 models as follows:

content

What conclusion can we make according to these calculations?

So we come to a pretty amazing conclusion: using one weak model we get the correct answer only 51% of the time while using the results of thousands of such models and aggregating their results using hard voting we achieve an accuracy of more than 90%!

However, there is a significant nuance to consider. All the conclusions mentioned above hold true only if the results of each specific model are independent of the results of other models. In practice, this condition often poses substantial challenges. When we train multiple models on the same data, they tend to produce identical outcomes and, as a result, lose their independence.

What is Bootstrap?

Bootstrap technology is used to overcome this problem when training ensembles using the bagging method.
The core idea of the method lies in training each individual weak model not on the entire training set but on a random subsample of the training data. As a result, we obtain a set of models, each trained on a different subset of data, which can be considered independent of each other.

Soft Voting

Soft voting is an aggregation technique where the predictions of the base models are combined by taking into account the probabilities (confidence levels) assigned to each class label rather than just considering the majority vote.
Here's how soft voting works:

  1. For each base model in the ensemble, the model assigns class probabilities to each possible class label for a given input sample. These probabilities represent the model's confidence in its predictions.
  2. To make a final prediction using soft voting, the ensemble takes the average of the predicted probabilities across all base models for each class label.
  3. The class label with the highest average probability is then chosen as the final prediction for the ensemble.

    Note

    Soft voting aggregation technique can be applied only for base estimators that have .predict_proba() method.

In soft voting, the final prediction is based on:

Select the correct answer

Everything was clear?

Section 1. Chapter 2
course content

Course Content

Ensemble Learning

Bagging Models

Bagging (bootstrap aggregating) model - an ensemble learning model that consists of identical base models and aggregates their results using voting. Voting, in this case, means that the ensemble will give the result value that the majority of weak learners vote for it. In more detail, we will discuss two types of voting, soft voting and hard voting.

Hard Voting

Suppose we solve a binary classification problem using some number of weak learners ( for example, logistic regression or SVM). In this scenario, we will consider the final prediction to be the class that receives the majority of votes from the weak learners.

Why is this approach better than using a single model?

  1. Firstly, let's consider a scenario where one simple model that produces the correct result 51 percent of the time. Such a result is only slightly better than random guessing.
  2. Let's calculate the probability of getting the correct result using an ensemble of 1000 of these weak models. If we use hard voting, the probability of getting the correct result is equivalent to the probability that more than 500 models will give the correct result.

Assuming that the results of all weak learners are independent, we can use the Central Limit Theorem to calculate the probability of getting the right answer (you can get familiar with the Central Limit Theorem in these chapters: Chapter 1 , Chapter 2 :

ξi - the result of binary classificator i. This result is 1 with a probability 0.51 and is 0 with a probability 0.49. Using the CLT, we can calculate the probability of getting more than 500 right results among 1000 models as follows:

content

What conclusion can we make according to these calculations?

So we come to a pretty amazing conclusion: using one weak model we get the correct answer only 51% of the time while using the results of thousands of such models and aggregating their results using hard voting we achieve an accuracy of more than 90%!

However, there is a significant nuance to consider. All the conclusions mentioned above hold true only if the results of each specific model are independent of the results of other models. In practice, this condition often poses substantial challenges. When we train multiple models on the same data, they tend to produce identical outcomes and, as a result, lose their independence.

What is Bootstrap?

Bootstrap technology is used to overcome this problem when training ensembles using the bagging method.
The core idea of the method lies in training each individual weak model not on the entire training set but on a random subsample of the training data. As a result, we obtain a set of models, each trained on a different subset of data, which can be considered independent of each other.

Soft Voting

Soft voting is an aggregation technique where the predictions of the base models are combined by taking into account the probabilities (confidence levels) assigned to each class label rather than just considering the majority vote.
Here's how soft voting works:

  1. For each base model in the ensemble, the model assigns class probabilities to each possible class label for a given input sample. These probabilities represent the model's confidence in its predictions.
  2. To make a final prediction using soft voting, the ensemble takes the average of the predicted probabilities across all base models for each class label.
  3. The class label with the highest average probability is then chosen as the final prediction for the ensemble.

    Note

    Soft voting aggregation technique can be applied only for base estimators that have .predict_proba() method.

In soft voting, the final prediction is based on:

Select the correct answer

Everything was clear?

Section 1. Chapter 2
some-alt