Conteúdo do Curso
Ensemble Learning
1. Basic Principles of Building Ensemble Models
Ensemble Learning
Bagging Regressor
Bagging Regressor creates an ensemble of multiple base regression models and combines their predictions to produce a final prediction. In Bagging Regressor, the base model is typically a regression algorithm, such as Decision Tree Regressor. The main idea behind Bagging Regressor is to reduce overfitting and improve the stability and accuracy of the predictions by averaging the predictions of multiple base models.
How does Bagging Regressor work?
- Bootstrap Sampling: The Bagging Regressor generates multiple subsets of the training data by randomly selecting samples with replacements. Each subset is called a bootstrap sample;
- Base Model Training: A separate base regression model (e.g., Decision Tree Regressor) is trained on each bootstrap sample. This creates multiple base models, each with its own variation due to the different subsets of data they were trained on;
- Aggregation of Predictions: The Bagging Regressor aggregates the predictions from all base models to make predictions. In the case of regression tasks, the predictions are typically averaged across the base models to form the final prediction. This ensemble approach helps to reduce overfitting and improve the overall model performance.
Note
Selecting samples with replacement is a concept often used in statistics and probability. It refers to a method of sampling data points or elements from a dataset or population where, after each selection, the selected item is put back into the dataset before the next selection. In other words, the same item can be chosen more than once in the sampling process.
Example of usage
The principle of using a Bagging Regressor in Python is the same as using a Bagging Classifier. The only difference is that Bagging Regressor has no implementation for the .predict_proba()
method - the .predict()
method is used to create predictions instead.
Note
If we don't specify the base model of
AdaBoostRegressor
, theDecisionTreeRegressor
will be used by default.
Code Description
fetch_california_housing from sklearn.datasets
to load the California Housing dataset. This dataset contains features related to various aspects of housing in California, and the target variable is the median house value for California districts.
train_test_split()
function from sklearn.model_selection
module. The training set will be used to train the Bagging Regressor, and the testing set will be used to evaluate its performance.
DecisionTreeRegressor
class from sklearn.tree
module. The base model is a Decision Tree Regressor, which will be used as the weak learner within the Bagging Regressor.
BaggingRegressor
class with the Decision Tree Regressor as the base model. The BaggingRegressor
will train multiple instances of the base model on different subsets of the training data.
.fit()
method. The Bagging Regressor sequentially trains multiple instances of the base model on different subsets of the training data to create the ensemble.
.predict()
method. The Bagging Regressor combines the predictions of all base models to make the final predictions.
mean_squared_error()
function from sklearn.metrics
module. MSE is a common metric used to evaluate regression models, and it gives us a measure of how well the Bagging Regressor is performing on the California Housing dataset.
Tudo estava claro?
Conteúdo do Curso
Ensemble Learning
1. Basic Principles of Building Ensemble Models
Ensemble Learning
Bagging Regressor
Bagging Regressor creates an ensemble of multiple base regression models and combines their predictions to produce a final prediction. In Bagging Regressor, the base model is typically a regression algorithm, such as Decision Tree Regressor. The main idea behind Bagging Regressor is to reduce overfitting and improve the stability and accuracy of the predictions by averaging the predictions of multiple base models.
How does Bagging Regressor work?
- Bootstrap Sampling: The Bagging Regressor generates multiple subsets of the training data by randomly selecting samples with replacements. Each subset is called a bootstrap sample;
- Base Model Training: A separate base regression model (e.g., Decision Tree Regressor) is trained on each bootstrap sample. This creates multiple base models, each with its own variation due to the different subsets of data they were trained on;
- Aggregation of Predictions: The Bagging Regressor aggregates the predictions from all base models to make predictions. In the case of regression tasks, the predictions are typically averaged across the base models to form the final prediction. This ensemble approach helps to reduce overfitting and improve the overall model performance.
Note
Selecting samples with replacement is a concept often used in statistics and probability. It refers to a method of sampling data points or elements from a dataset or population where, after each selection, the selected item is put back into the dataset before the next selection. In other words, the same item can be chosen more than once in the sampling process.
Example of usage
The principle of using a Bagging Regressor in Python is the same as using a Bagging Classifier. The only difference is that Bagging Regressor has no implementation for the .predict_proba()
method - the .predict()
method is used to create predictions instead.
Note
If we don't specify the base model of
AdaBoostRegressor
, theDecisionTreeRegressor
will be used by default.
Code Description
fetch_california_housing from sklearn.datasets
to load the California Housing dataset. This dataset contains features related to various aspects of housing in California, and the target variable is the median house value for California districts.
train_test_split()
function from sklearn.model_selection
module. The training set will be used to train the Bagging Regressor, and the testing set will be used to evaluate its performance.
DecisionTreeRegressor
class from sklearn.tree
module. The base model is a Decision Tree Regressor, which will be used as the weak learner within the Bagging Regressor.
BaggingRegressor
class with the Decision Tree Regressor as the base model. The BaggingRegressor
will train multiple instances of the base model on different subsets of the training data.
.fit()
method. The Bagging Regressor sequentially trains multiple instances of the base model on different subsets of the training data to create the ensemble.
.predict()
method. The Bagging Regressor combines the predictions of all base models to make the final predictions.
mean_squared_error()
function from sklearn.metrics
module. MSE is a common metric used to evaluate regression models, and it gives us a measure of how well the Bagging Regressor is performing on the California Housing dataset.
Tudo estava claro?