What is Random Forest

Random Forest is an algorithm used widely in Classification and Regression problems. It builds many different Decision Trees and takes their majority vote for classification and average in case of regression.

Instead of using the best tree, Random Forest builds a lot of worse trees. Why would we make trees that we know are worse?
Well, suppose you have a complex task, and you give it to a professor - an expert in this field. You can trust his answer, but he is still a human and can make mistakes. Maybe if you gave the task to 100 good students and chose the most frequent answer, the result would be more trustworthy.

In practice, combining many weaker Decision Trees into one strong Random Forest works very well, greatly overperforming a tuned single Decision Tree on large datasets.
The decision boundary of a Random Forest is smoother and generalizes to new data better than the Decision Tree, so Random Forest does not suffer from overfitting that much.

However, the accuracy will not improve if we combine many models that make the same mistakes. For this whole thing to work, we should choose models that are as different from each other as possible so they produce different mistakes.

The next chapter will shed some light on why the Forest is Random and how we produce many different models using only the Decision Tree algorithm.

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Classification with Python

What is Random Forest

The next chapter will shed some light on why the Forest is Random and how we produce many different models using only the Decision Tree algorithm.

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 1