Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Random Forest | Commonly Used Bagging Models
course content

Course Content

Ensemble Learning

Random ForestRandom Forest

Random Forest is a bagging ensemble algorithm that is used for both classification and regression tasks. The basic idea behind Random Forest is to create a "forest" of decision trees, where each tree is trained on a different subset of the data and provides its own prediction.

How does Random Forest works?

  1. Bootstrapping and Data Subset: Each tree in the forest is trained using a random subset drawn from the original dataset via bootstrapping. This process involves selecting data points with replacement, creating diverse subsets for each tree;
  2. Decision Tree Construction: These subsets build individual decision trees. Data is recursively divided using features and thresholds, forming binary splits that lead to leaf nodes containing predictions;
  3. Random Feature Selection: Within each tree, only a random subset of features is considered for creating splits. This randomness prevents single features from overpowering predictions and enhances tree diversity;
  4. Prediction Aggregation: After training, each tree predicts for data points. For classification, we use hard or soft voting to create a prediction; for regression, predictions are averaged to provide the final outcome.

We can notice a rather interesting feature of a random tree: each base model is trained not only on a random subset of the training set, but also on a random subset of features. As a result, we get more independent base models and, as a result, more accurate final predictions.

Example

Let's solve the classification task using Random Forest on Iris dataset:

Code Description
  • Import Libraries:
  • Import necessary libraries and modules from scikit-learn:
    - load_iris: Used to load the Iris dataset.
    - train_test_split: Used to split the dataset into training and testing sets.
    - RandomForestClassifier: The classifier we'll be using, which is part of the ensemble module.
    - f1_score: The function to calculate the F1 score for model evaluation.
  • Load and Prepare Data:
  • - Load the Iris dataset using load_iris.
    - Extract the features into X and the target variable into y.
  • Train-Test Split:
  • -split the data into training and testing sets using train_test_split.
    test_size=0.2 specifies that 20% of the data will be used for testing.
  • Create and Train the Random Forest Classifier:
  • - Create an instance of RandomForestClassifier with n_estimators=100 (number of trees in the forest) and n_jobs=-1 (to train the model using all processors in parallel).
    - Train the classifier using the training data (features and target) with the .fit() method.
  • Make Predictions:
  • - Use the trained classifier to make predictions on the test data (X_test).
    - Store the predicted labels in y_pred.
  • Calculate F1 Score:
  • - Calculate the F1 score using the f1_score() function.
    - The average='weighted' parameter indicates that the F1 score sho
    You can find the official documentation with all the necessary information about implementing this model in Python on the official website. Go here if needed.

    What model is used as a base model in Random Forest?

    Select the correct answer

    Everything was clear?

    Section 2. Chapter 5
    course content

    Course Content

    Ensemble Learning

    Random ForestRandom Forest

    Random Forest is a bagging ensemble algorithm that is used for both classification and regression tasks. The basic idea behind Random Forest is to create a "forest" of decision trees, where each tree is trained on a different subset of the data and provides its own prediction.

    How does Random Forest works?

    1. Bootstrapping and Data Subset: Each tree in the forest is trained using a random subset drawn from the original dataset via bootstrapping. This process involves selecting data points with replacement, creating diverse subsets for each tree;
    2. Decision Tree Construction: These subsets build individual decision trees. Data is recursively divided using features and thresholds, forming binary splits that lead to leaf nodes containing predictions;
    3. Random Feature Selection: Within each tree, only a random subset of features is considered for creating splits. This randomness prevents single features from overpowering predictions and enhances tree diversity;
    4. Prediction Aggregation: After training, each tree predicts for data points. For classification, we use hard or soft voting to create a prediction; for regression, predictions are averaged to provide the final outcome.

    We can notice a rather interesting feature of a random tree: each base model is trained not only on a random subset of the training set, but also on a random subset of features. As a result, we get more independent base models and, as a result, more accurate final predictions.

    Example

    Let's solve the classification task using Random Forest on Iris dataset:

    Code Description
  • Import Libraries:
  • Import necessary libraries and modules from scikit-learn:
    - load_iris: Used to load the Iris dataset.
    - train_test_split: Used to split the dataset into training and testing sets.
    - RandomForestClassifier: The classifier we'll be using, which is part of the ensemble module.
    - f1_score: The function to calculate the F1 score for model evaluation.
  • Load and Prepare Data:
  • - Load the Iris dataset using load_iris.
    - Extract the features into X and the target variable into y.
  • Train-Test Split:
  • -split the data into training and testing sets using train_test_split.
    test_size=0.2 specifies that 20% of the data will be used for testing.
  • Create and Train the Random Forest Classifier:
  • - Create an instance of RandomForestClassifier with n_estimators=100 (number of trees in the forest) and n_jobs=-1 (to train the model using all processors in parallel).
    - Train the classifier using the training data (features and target) with the .fit() method.
  • Make Predictions:
  • - Use the trained classifier to make predictions on the test data (X_test).
    - Store the predicted labels in y_pred.
  • Calculate F1 Score:
  • - Calculate the F1 score using the f1_score() function.
    - The average='weighted' parameter indicates that the F1 score sho
    You can find the official documentation with all the necessary information about implementing this model in Python on the official website. Go here if needed.

    What model is used as a base model in Random Forest?

    Select the correct answer

    Everything was clear?

    Section 2. Chapter 5
    some-alt