Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Decision Tree Summary | Decision Tree
Classification with Python

Decision Tree Summary

Let's now look at some of Decision Tree's peculiarities.

  1. Interpretability.
    Unlike most Machine Learning algorithms, Decision Trees are easy to visualize and interpret;
  2. No data preparation required.
    The decision Tree requires none to very little data preparation. It does not need scaling/normalization. It can also handle missing values and does not suffer from outliers much;
  3. Provides feature importances.
    While training, a Decision Tree calculates feature importances that represent how impactful each feature was to form the Tree. You can get feature importances using the .feature_importances_ attribute;
  4. Computational complexity.
    Suppose m is a number of features and n is a number of training instances. The complexity of a Decision Tree training is O(n·m·log(m)), so the training is quite fast unless there is a large number of features. Also, the complexity of predicting is O(log(n)), so the predictions are fast;
  5. Not suitable for large datasets.
    Although Decision Trees may work great for small sets, they usually don't work well for large datasets. Using Random Forest is preferable for large datasets;
  6. Decision Trees are unstable.
    Small changes in hyperparameters or data may cause a very different tree. Although it is a disadvantage for a single Tree, it will benefit us in a Random Forest, as you will see in the next section.

And here is a little summary:

AdvantagesDisadvantages
InterpretableOverfitting
Fast trainingUnstable
Fast predictionsNot suitable for large datasets
No feature scaling required
Provides feature importances
Usually robust to outliers

Everything was clear?

Section 3. Chapter 5
course content

Course Content

Classification with Python

Decision Tree Summary

Let's now look at some of Decision Tree's peculiarities.

  1. Interpretability.
    Unlike most Machine Learning algorithms, Decision Trees are easy to visualize and interpret;
  2. No data preparation required.
    The decision Tree requires none to very little data preparation. It does not need scaling/normalization. It can also handle missing values and does not suffer from outliers much;
  3. Provides feature importances.
    While training, a Decision Tree calculates feature importances that represent how impactful each feature was to form the Tree. You can get feature importances using the .feature_importances_ attribute;
  4. Computational complexity.
    Suppose m is a number of features and n is a number of training instances. The complexity of a Decision Tree training is O(n·m·log(m)), so the training is quite fast unless there is a large number of features. Also, the complexity of predicting is O(log(n)), so the predictions are fast;
  5. Not suitable for large datasets.
    Although Decision Trees may work great for small sets, they usually don't work well for large datasets. Using Random Forest is preferable for large datasets;
  6. Decision Trees are unstable.
    Small changes in hyperparameters or data may cause a very different tree. Although it is a disadvantage for a single Tree, it will benefit us in a Random Forest, as you will see in the next section.

And here is a little summary:

AdvantagesDisadvantages
InterpretableOverfitting
Fast trainingUnstable
Fast predictionsNot suitable for large datasets
No feature scaling required
Provides feature importances
Usually robust to outliers

Everything was clear?

Section 3. Chapter 5
some-alt