Random Forest is a bagging ensemble algorithm that is used for both classification and regression tasks. The basic idea behind Random Forest is to create a "forest" of decision trees, where each tree is trained on a different subset of the data and provides its own prediction.
How does Random Forest works?
- Bootstrapping and Data Subset: Each tree in the forest is trained using a random subset drawn from the original dataset via bootstrapping. This process involves selecting data points with replacement, creating diverse subsets for each tree.
- Decision Tree Construction: These subsets build individual decision trees. Data is recursively divided using features and thresholds, forming binary splits that lead to leaf nodes containing predictions.
- Random Feature Selection: Within each tree, only a random subset of features is considered for creating splits. This randomness prevents single features from overpowering predictions and enhances tree diversity.
- Prediction Aggregation: After training, each tree predicts for data points. For classification, we use hard or soft voting to create a prediction; for regression, predictions are averaged to provide the final outcome.
We can notice a rather interesting feature of a random tree: each base model is trained not only on a random subset of the training set, but also on a random subset of features. As a result, we get more independent base models and, as a result, more accurate final predictions.
Let's solve classification task using Random Forest on Iris dataset:
load_iris: Used to load the Iris dataset.
train_test_split: Used to split the dataset into training and testing sets.
RandomForestClassifier: The classifier we'll be using, which is part of the ensemble module.
f1_score: The function to calculate the F1 score for model evaluation.
- Extract the features into
Xand the target variable into
test_size=0.2specifies that 20% of the data will be used for testing.
n_estimators=100(number of trees in the forest) and
n_jobs=-1(to train the model using all processors in parallel).
- Train the classifier using the training data (features and target) with the
- Store the predicted labels in
average='weighted'parameter indicates that the F1 score sho
What model is used as a base model in Random Forest?
Select the correct answer
Everything was clear?