Course Content
Classification with Python
Classification with Python
What is Decision Tree
For many real-life problems, we can build a Decision Tree. In a Decision Tree, we ask a question(Decision Node), and based on the answer, we either come up with a decision(Leaf Node) or ask more questions(Decision Node), and so on.
Here is the example for a duck/not a duck test.
It turns out that if we apply the same logic to the training data, we will get one of the most important Machine Learning algorithms that can be used for both regression and classification. In this course, we will focus on classification. The following video shows how it works.
Note
In the video above, 'Classes' shows the number of instances(also called samples) of each class at a Node. For example, Root Node holds all the instances(4 'cookies', 4 'not cookies'). And the Leaf Node at the left has only 3 'not cookies'.
With each Decision Node, we try to split the training data so that the data points of each class are separated into their own Leaf Nodes.
A Decision Tree also handles multiclass classification easily:
And classification with multiple features can also be handled by the Decision Tree. Now each Decision Node can split the data using any of the features. Here is a video with an example:
Note
In the video above, the training set is scaled using
StandardScaler
. It is not necessary for the Decision tree. It will perform just as well on the unscaled data. But scaling improves the performance of all other algorithms, so it's a good idea to always add the scaling to your preprocessing.
Thanks for your feedback!