Course Content

Ensemble Learning

1. Basic Principles of Building Ensemble Models

What is Ensemble of Models?Bagging Models Boosting Models Stacking Models

2. Commonly Used Bagging Models

Bagging Classifier Challenge: Solving Task Using Bagging Classifier Bagging Regressor Challenge: Solving Task Using Bagging Regressor Random Forest Challenge: Determining Feature Importances Using Random Forest ExtraTrees

3. Commonly Used Boosting Models

AdaBoost Classifier Challenge: Solving Task Using AdaBoost Classifier Challenge: Solving Task Using AdaBoost Regressor Gradient Boosting XGBoost Challenge: Solving Task Using XGBoost

4. Commonly Used Stacking Models

Stacking Classifier Challenge: Solving Task Using Stacking Classifier Challenge: Solving Task Using Stacking Regressor Using Ensembles As Base Models Course Summary

Bagging Models

Bagging (bootstrap aggregating) model - an ensemble learning model that consists of identical base models and aggregates their results using voting. Voting, in this case, means that the ensemble will give the result value that the majority of weak learners vote for it. In more detail, we will discuss two types of voting, soft voting and hard voting.

Hard Voting

Suppose we solve a binary classification problem using some number of weak learners ( for example, logistic regression or SVM). In this scenario, we will consider the final prediction to be the class that receives the majority of votes from the weak learners.

Why is this approach better than using a single model?

Firstly, let's consider a scenario where one simple model that produces the correct result 51 percent of the time. Such a result is only slightly better than random guessing.
Let's calculate the probability of getting the correct result using an ensemble of 1000 of these weak models. If we use hard voting, the probability of getting the correct result is equivalent to the probability that more than 500 models will give the correct result.

Assuming that the results of all weak learners are independent, we can use the Central Limit Theorem to calculate the probability of getting the right answer (you can get familiar with the Central Limit Theorem in these chapters: Chapter 1 , Chapter 2 :

ξi - the result of binary classificator i. This result is 1 with a probability 0.51 and is 0 with a probability 0.49. Using the CLT, we can calculate the probability of getting more than 500 right results among 1000 models as follows:

What conclusion can we make according to these calculations?

So we come to a pretty amazing conclusion: using one weak model we get the correct answer only 51% of the time while using the results of thousands of such models and aggregating their results using hard voting we achieve an accuracy of more than 90%!

However, there is a significant nuance to consider. All the conclusions mentioned above hold true only if the results of each specific model are independent of the results of other models. In practice, this condition often poses substantial challenges. When we train multiple models on the same data, they tend to produce identical outcomes and, as a result, lose their independence.

What is Bootstrap?

Bootstrap technology is used to overcome this problem when training ensembles using the bagging method.
The core idea of the method lies in training each individual weak model not on the entire training set but on a random subsample of the training data. As a result, we obtain a set of models, each trained on a different subset of data, which can be considered independent of each other.

Soft Voting

Soft voting is an aggregation technique where the predictions of the base models are combined by taking into account the probabilities (confidence levels) assigned to each class label rather than just considering the majority vote.
Here's how soft voting works:

For each base model in the ensemble, the model assigns class probabilities to each possible class label for a given input sample. These probabilities represent the model's confidence in its predictions.
To make a final prediction using soft voting, the ensemble takes the average of the predicted probabilities across all base models for each class label.
The class label with the highest average probability is then chosen as the final prediction for the ensemble.

Note

Soft voting aggregation technique can be applied only for base estimators that have .predict_proba() method.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat