Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Random Forest | Commonly Used Bagging Models
Ensemble Learning
course content

Course Content

Ensemble Learning

Ensemble Learning

1. Basic Principles of Building Ensemble Models
2. Commonly Used Bagging Models
3. Commonly Used Boosting Models
4. Commonly Used Stacking Models

bookRandom Forest

Random Forest is a bagging ensemble algorithm that is used for both classification and regression tasks. The basic idea behind Random Forest is to create a "forest" of decision trees, where each tree is trained on a different subset of the data and provides its own prediction.

How does Random Forest works?

  1. Bootstrapping and Data Subset: Each tree in the forest is trained using a random subset drawn from the original dataset via bootstrapping. This process involves selecting data points with replacement, creating diverse subsets for each tree;

  2. Decision Tree Construction: These subsets build individual decision trees. Data is recursively divided using features and thresholds, forming binary splits that lead to leaf nodes containing predictions;

  3. Random Feature Selection: Within each tree, only a random subset of features is considered for creating splits. This randomness prevents single features from overpowering predictions and enhances tree diversity;

  4. Prediction Aggregation: After training, each tree predicts for data points. For classification, we use hard or soft voting to create a prediction; for regression, predictions are averaged to provide the final outcome.

We can notice a rather interesting feature of a random tree: each base model is trained not only on a random subset of the training set, but also on a random subset of features. As a result, we get more independent base models and, as a result, more accurate final predictions.

Example

Let's solve the classification task using Random Forest on Iris dataset:

1234567891011121314151617181920212223242526
# Import necessary libraries from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import f1_score # Load the Iris dataset iris = load_iris() X = iris.data # Features y = iris.target # Target variable # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a Random Forest classifier rf_classifier = RandomForestClassifier(n_estimators=100, n_jobs=-1) # Train the classifier on the training data rf_classifier.fit(X_train, y_train) # Make predictions on the test data y_pred = rf_classifier.predict(X_test) # Calculate the F1 score of the classifier f1 = f1_score(y_test, y_pred, average='weighted') print(f'F1 Score: {f1:.2f}')
copy
What model is used as a base model in Random Forest?

What model is used as a base model in Random Forest?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 5
We're sorry to hear that something went wrong. What happened?
some-alt