Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Summary | Comparing Models
Classification with Python

Summary

Let's sum it all up! We learned four algorithms: k-NN, Logistic Regression, Decision Tree, and Random Forest. Each has its pros and cons that are covered at the end of each section.
The following visualization shows how each algorithm performs on some synthetic datasets.

Here the more confident in predictions the model is, the deeper the color is. You can notice that each dataset has a different best model. It is hard to tell which model will work better in advance, so the best way is to try them all. That's what the No Free Lunch Theorem meant.
However, in some cases, the knowledge about the algorithms may tell you in advance that the algorithm is not suited for the task.

For example, it is a case with Logistic Regression(without PolynomialFeatures), which we know, provides a linear Decision Boundary. So looking at the complexity of the second dataset on an image, we could tell in advance that it would not work well.
As another example, if your task requires a lightning-fast prediction speed(e.g., making real-time predictions in an app), then k-NN is a poor choice. So is Random Forest with many Decision Trees(but you can decrease the n_estimators, and maybe you will get acceptable speed, but performance will worsen).

The following table will help you with what preprocessing must be done before training the model and how much slower the model will become with increasing the number of features/instances.

Which model uses multiple decision trees to make a prediction?

Select the correct answer

Everything was clear?

Section 5. Chapter 4
course content

Course Content

Classification with Python

Summary

Let's sum it all up! We learned four algorithms: k-NN, Logistic Regression, Decision Tree, and Random Forest. Each has its pros and cons that are covered at the end of each section.
The following visualization shows how each algorithm performs on some synthetic datasets.

Here the more confident in predictions the model is, the deeper the color is. You can notice that each dataset has a different best model. It is hard to tell which model will work better in advance, so the best way is to try them all. That's what the No Free Lunch Theorem meant.
However, in some cases, the knowledge about the algorithms may tell you in advance that the algorithm is not suited for the task.

For example, it is a case with Logistic Regression(without PolynomialFeatures), which we know, provides a linear Decision Boundary. So looking at the complexity of the second dataset on an image, we could tell in advance that it would not work well.
As another example, if your task requires a lightning-fast prediction speed(e.g., making real-time predictions in an app), then k-NN is a poor choice. So is Random Forest with many Decision Trees(but you can decrease the n_estimators, and maybe you will get acceptable speed, but performance will worsen).

The following table will help you with what preprocessing must be done before training the model and how much slower the model will become with increasing the number of features/instances.

Which model uses multiple decision trees to make a prediction?

Select the correct answer

Everything was clear?

Section 5. Chapter 4
some-alt