Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Isolation Forest: Tree-Based Anomaly Detection | Isolation-Based Methods
Outlier and Novelty Detection in Practice

bookIsolation Forest: Tree-Based Anomaly Detection

Isolation Forest is a powerful tree-based method for anomaly detection that relies on the principle of isolating data points through random partitioning.

The core intuition is straightforward: anomalies are data points that are few and different, making them easier to separate from the rest of the data.

Instead of modeling the distribution of normal data, Isolation Forest constructs an ensemble of random trees. Each tree recursively splits the data by randomly selecting a feature, then randomly choosing a split value between the minimum and maximum values of that feature. This process continues until each data point is isolated in its own partition.

Through this random partitioning, data points that are anomalies tend to be isolated much sooner than normal points, meaning they require fewer splits to be separated from the rest. This is because anomalies are more likely to have attribute values that are very different from those of the majority of the data. In contrast, normal points are typically located in dense regions and require more splits to be isolated.

Note
Note

Anomalies are easier to isolate because they are rare and have attribute values that are significantly different from the majority of the data. In Isolation Forest, the path length—the number of splits required to isolate a point—serves as the basis for the anomaly score. Shorter average path lengths across the random trees indicate higher likelihood of being an anomaly, while longer path lengths suggest the point is more like the rest of the data. This means that anomaly scores in Isolation Forest reflect how quickly a point is separated from the rest.

1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np import matplotlib.pyplot as plt from matplotlib.patches import Rectangle # Generate a simple 2D dataset np.random.seed(42) X_normal = np.random.randn(50, 2) X_outlier = np.array([[6, 6], [-6, -6]]) X = np.vstack([X_normal, X_outlier]) # Function to recursively partition and plot def plot_partition(ax, X, depth=0, max_depth=3, bounds=None): if depth == max_depth or len(X) <= 1: return if bounds is None: x_min, y_min = X.min(axis=0) x_max, y_max = X.max(axis=0) bounds = [x_min, x_max, y_min, y_max] # Randomly choose split feature and value feat = np.random.choice([0, 1]) split = np.random.uniform(X[:, feat].min(), X[:, feat].max()) if feat == 0: ax.plot([split, split], [bounds[2], bounds[3]], 'r--', alpha=0.6) left = X[X[:, 0] < split] right = X[X[:, 0] >= split] plot_partition(ax, left, depth+1, max_depth, [bounds[0], split, bounds[2], bounds[3]]) plot_partition(ax, right, depth+1, max_depth, [split, bounds[1], bounds[2], bounds[3]]) else: ax.plot([bounds[0], bounds[1]], [split, split], 'b--', alpha=0.6) below = X[X[:, 1] < split] above = X[X[:, 1] >= split] plot_partition(ax, below, depth+1, max_depth, [bounds[0], bounds[1], bounds[2], split]) plot_partition(ax, above, depth+1, max_depth, [bounds[0], bounds[1], split, bounds[3]]) fig, ax = plt.subplots(figsize=(6, 6)) ax.scatter(X_normal[:, 0], X_normal[:, 1], label="Normal", c="tab:blue") ax.scatter(X_outlier[:, 0], X_outlier[:, 1], label="Outlier", c="tab:red") plot_partition(ax, X, max_depth=3) ax.legend() ax.set_title("Isolation Forest: Random Partitioning in 2D") plt.show()
copy
question mark

Which of the following statements best describes why Isolation Forest is effective for anomaly detection?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 1

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Awesome!

Completion rate improved to 4.55

bookIsolation Forest: Tree-Based Anomaly Detection

Swipe um das Menü anzuzeigen

Isolation Forest is a powerful tree-based method for anomaly detection that relies on the principle of isolating data points through random partitioning.

The core intuition is straightforward: anomalies are data points that are few and different, making them easier to separate from the rest of the data.

Instead of modeling the distribution of normal data, Isolation Forest constructs an ensemble of random trees. Each tree recursively splits the data by randomly selecting a feature, then randomly choosing a split value between the minimum and maximum values of that feature. This process continues until each data point is isolated in its own partition.

Through this random partitioning, data points that are anomalies tend to be isolated much sooner than normal points, meaning they require fewer splits to be separated from the rest. This is because anomalies are more likely to have attribute values that are very different from those of the majority of the data. In contrast, normal points are typically located in dense regions and require more splits to be isolated.

Note
Note

Anomalies are easier to isolate because they are rare and have attribute values that are significantly different from the majority of the data. In Isolation Forest, the path length—the number of splits required to isolate a point—serves as the basis for the anomaly score. Shorter average path lengths across the random trees indicate higher likelihood of being an anomaly, while longer path lengths suggest the point is more like the rest of the data. This means that anomaly scores in Isolation Forest reflect how quickly a point is separated from the rest.

1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np import matplotlib.pyplot as plt from matplotlib.patches import Rectangle # Generate a simple 2D dataset np.random.seed(42) X_normal = np.random.randn(50, 2) X_outlier = np.array([[6, 6], [-6, -6]]) X = np.vstack([X_normal, X_outlier]) # Function to recursively partition and plot def plot_partition(ax, X, depth=0, max_depth=3, bounds=None): if depth == max_depth or len(X) <= 1: return if bounds is None: x_min, y_min = X.min(axis=0) x_max, y_max = X.max(axis=0) bounds = [x_min, x_max, y_min, y_max] # Randomly choose split feature and value feat = np.random.choice([0, 1]) split = np.random.uniform(X[:, feat].min(), X[:, feat].max()) if feat == 0: ax.plot([split, split], [bounds[2], bounds[3]], 'r--', alpha=0.6) left = X[X[:, 0] < split] right = X[X[:, 0] >= split] plot_partition(ax, left, depth+1, max_depth, [bounds[0], split, bounds[2], bounds[3]]) plot_partition(ax, right, depth+1, max_depth, [split, bounds[1], bounds[2], bounds[3]]) else: ax.plot([bounds[0], bounds[1]], [split, split], 'b--', alpha=0.6) below = X[X[:, 1] < split] above = X[X[:, 1] >= split] plot_partition(ax, below, depth+1, max_depth, [bounds[0], bounds[1], bounds[2], split]) plot_partition(ax, above, depth+1, max_depth, [bounds[0], bounds[1], split, bounds[3]]) fig, ax = plt.subplots(figsize=(6, 6)) ax.scatter(X_normal[:, 0], X_normal[:, 1], label="Normal", c="tab:blue") ax.scatter(X_outlier[:, 0], X_outlier[:, 1], label="Outlier", c="tab:red") plot_partition(ax, X, max_depth=3) ax.legend() ax.set_title("Isolation Forest: Random Partitioning in 2D") plt.show()
copy
question mark

Which of the following statements best describes why Isolation Forest is effective for anomaly detection?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 1
some-alt