Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Random Forest Summary | Random Forest
Classification with Python

Random Forest Summary

Let's look at Random Forest's peculiarities:

  1. No data preparation is required.
    Since Random Forest is a bunch of Decision Trees, the preprocessing needed for Random Forest is the same as for Decision Trees, which is very little;
  2. Provides feature importances.
    Just like the Decision Tree, Random Forest also provides feature importances that you can access using the .feature_importances_ attribute;
  3. Random Forest is relatively slow.
    Since Random Forest trains a lot of Decision Trees(100 by default) during training, it can become quite slow for large datasets. And to make a prediction, a new instance should also run through all the trees, so predictions can also become slow if many trees are used;
  4. Handles datasets with many features well.
    Thanks to sampling features, Random Forest's training time does not suffer much from a large number of features. Also, the model can easily ignore useless features just because a better feature will be chosen at each Decision Node. So useless features do not worsen the model unless there are too many of them;
  5. Suitable for complex tasks.
    A Decision Tree can build complex decision boundaries, but they are not smooth and very likely to overfit. In contrast, Random Forest produces smoother decision boundaries that generalize better, so Random Forest is much less likely to overfit. And unlike a single Decision Tree, Random Forest is stable, meaning it does not change drastically with minor changes to the dataset or hyperparameters.

And here is a little summary:

AdvantagesDisadvantages
No OverfittingSlow
Handles datasets with many features wellNot interpretable
Stable
No feature scaling required
Provides feature importances
Usually robust to outliers
Suitable for complex tasks

Everything was clear?

Section 4. Chapter 4
course content

Course Content

Classification with Python

Random Forest Summary

Let's look at Random Forest's peculiarities:

  1. No data preparation is required.
    Since Random Forest is a bunch of Decision Trees, the preprocessing needed for Random Forest is the same as for Decision Trees, which is very little;
  2. Provides feature importances.
    Just like the Decision Tree, Random Forest also provides feature importances that you can access using the .feature_importances_ attribute;
  3. Random Forest is relatively slow.
    Since Random Forest trains a lot of Decision Trees(100 by default) during training, it can become quite slow for large datasets. And to make a prediction, a new instance should also run through all the trees, so predictions can also become slow if many trees are used;
  4. Handles datasets with many features well.
    Thanks to sampling features, Random Forest's training time does not suffer much from a large number of features. Also, the model can easily ignore useless features just because a better feature will be chosen at each Decision Node. So useless features do not worsen the model unless there are too many of them;
  5. Suitable for complex tasks.
    A Decision Tree can build complex decision boundaries, but they are not smooth and very likely to overfit. In contrast, Random Forest produces smoother decision boundaries that generalize better, so Random Forest is much less likely to overfit. And unlike a single Decision Tree, Random Forest is stable, meaning it does not change drastically with minor changes to the dataset or hyperparameters.

And here is a little summary:

AdvantagesDisadvantages
No OverfittingSlow
Handles datasets with many features wellNot interpretable
Stable
No feature scaling required
Provides feature importances
Usually robust to outliers
Suitable for complex tasks

Everything was clear?

Section 4. Chapter 4
some-alt