Kursusindhold
Classification with Python
Classification with Python
Decision Tree Summary
Let's now look at some of Decision Tree's peculiarities.
Interpretability.
Unlike most Machine Learning algorithms, Decision Trees are easy to visualize and interpret;No data preparation required.
The decision Tree requires none to very little data preparation. It does not need scaling/normalization. It can also handle missing values and does not suffer from outliers much;Provides feature importances.
While training, a Decision Tree calculates feature importances that represent how impactful each feature was to form the Tree. You can get feature importances using the.feature_importances_
attribute;Computational complexity.
Suppose m is a number of features and n is a number of training instances. The complexity of a Decision Tree training is O(n·m·log(m)), so the training is quite fast unless there is a large number of features. Also, the complexity of predicting is O(log(n)), so the predictions are fast;Not suitable for large datasets.
Although Decision Trees may work great for small sets, they usually don't work well for large datasets. Using Random Forest is preferable for large datasets;Decision Trees are unstable.
Small changes in hyperparameters or data may cause a very different tree. Although it is a disadvantage for a single Tree, it will benefit us in a Random Forest, as you will see in the next section.
And here is a little summary:
Advantages | Disadvantages |
---|---|
Interpretable | Overfitting |
Fast training | Unstable |
Fast predictions | Not suitable for large datasets |
No feature scaling required | |
Provides feature importances | |
Usually robust to outliers |
Tak for dine kommentarer!