Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Comparing LOF and Isolation Forest | Density-Based Methods
Outlier and Novelty Detection in Practice

bookComparing LOF and Isolation Forest

Local Outlier Factor (LOF) and Isolation Forest are two widely used algorithms for outlier detection, each with distinct strengths and assumptions.

  • LOF measures the local density of each point compared to its neighbors. Outliers are points with much lower local density. LOF is effective when data contains clusters of varying density, as it highlights points that are unusual in their immediate neighborhood.

  • Isolation Forest isolates data points using random splits. Outliers are easier to isolate, so they require fewer splits. This method does not rely on distance or density, making it efficient for high-dimensional data and large datasets.

Summary of use cases:

  • Use LOF when you expect local density variations and need to find outliers relative to their surroundings;
  • Choose Isolation Forest for large or high-dimensional datasets, or when you need a scalable method less affected by the curse of dimensionality.
Note
Note

In practice, LOF tends to outperform Isolation Forest when the dataset contains clusters of varying densities, as LOF can detect outliers that are only anomalous within their local context. However, Isolation Forest is more robust and efficient for high-dimensional data or when outliers are globally distinct rather than locally rare. For time-series or streaming data, Isolation Forest's speed and scalability make it a better choice, while LOF may struggle due to its reliance on local neighborhoods.

1234567891011121314151617181920212223242526272829303132333435
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_blobs from sklearn.ensemble import IsolationForest from sklearn.neighbors import LocalOutlierFactor # Generate synthetic data with two clusters and some outliers X, _ = make_blobs(n_samples=300, centers=[[0, 0], [5, 5]], cluster_std=[0.8, 1.0], random_state=42) rng = np.random.RandomState(42) X_outliers = rng.uniform(low=-6, high=10, size=(20, 2)) X_all = np.vstack([X, X_outliers]) # Fit Isolation Forest iso_forest = IsolationForest(contamination=0.06, random_state=42) y_pred_iso = iso_forest.fit_predict(X_all) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20, contamination=0.06) y_pred_lof = lof.fit_predict(X_all) # Visualize results fig, axs = plt.subplots(1, 2, figsize=(12, 5)) axs[0].scatter(X_all[:, 0], X_all[:, 1], c=(y_pred_iso == -1), cmap='coolwarm', s=20) axs[0].set_title("Isolation Forest Outlier Detection") axs[0].set_xlabel("Feature 1") axs[0].set_ylabel("Feature 2") axs[1].scatter(X_all[:, 0], X_all[:, 1], c=(y_pred_lof == -1), cmap='coolwarm', s=20) axs[1].set_title("LOF Outlier Detection") axs[1].set_xlabel("Feature 1") axs[1].set_ylabel("Feature 2") plt.tight_layout() plt.show()
copy
question mark

Which statements about Local Outlier Factor (LOF) and Isolation Forest are correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain the main differences between LOF and Isolation Forest in more detail?

How do I interpret the results of the scatter plots for outlier detection?

When should I choose LOF over Isolation Forest for my dataset?

Awesome!

Completion rate improved to 4.55

bookComparing LOF and Isolation Forest

Swipe to show menu

Local Outlier Factor (LOF) and Isolation Forest are two widely used algorithms for outlier detection, each with distinct strengths and assumptions.

  • LOF measures the local density of each point compared to its neighbors. Outliers are points with much lower local density. LOF is effective when data contains clusters of varying density, as it highlights points that are unusual in their immediate neighborhood.

  • Isolation Forest isolates data points using random splits. Outliers are easier to isolate, so they require fewer splits. This method does not rely on distance or density, making it efficient for high-dimensional data and large datasets.

Summary of use cases:

  • Use LOF when you expect local density variations and need to find outliers relative to their surroundings;
  • Choose Isolation Forest for large or high-dimensional datasets, or when you need a scalable method less affected by the curse of dimensionality.
Note
Note

In practice, LOF tends to outperform Isolation Forest when the dataset contains clusters of varying densities, as LOF can detect outliers that are only anomalous within their local context. However, Isolation Forest is more robust and efficient for high-dimensional data or when outliers are globally distinct rather than locally rare. For time-series or streaming data, Isolation Forest's speed and scalability make it a better choice, while LOF may struggle due to its reliance on local neighborhoods.

1234567891011121314151617181920212223242526272829303132333435
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_blobs from sklearn.ensemble import IsolationForest from sklearn.neighbors import LocalOutlierFactor # Generate synthetic data with two clusters and some outliers X, _ = make_blobs(n_samples=300, centers=[[0, 0], [5, 5]], cluster_std=[0.8, 1.0], random_state=42) rng = np.random.RandomState(42) X_outliers = rng.uniform(low=-6, high=10, size=(20, 2)) X_all = np.vstack([X, X_outliers]) # Fit Isolation Forest iso_forest = IsolationForest(contamination=0.06, random_state=42) y_pred_iso = iso_forest.fit_predict(X_all) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20, contamination=0.06) y_pred_lof = lof.fit_predict(X_all) # Visualize results fig, axs = plt.subplots(1, 2, figsize=(12, 5)) axs[0].scatter(X_all[:, 0], X_all[:, 1], c=(y_pred_iso == -1), cmap='coolwarm', s=20) axs[0].set_title("Isolation Forest Outlier Detection") axs[0].set_xlabel("Feature 1") axs[0].set_ylabel("Feature 2") axs[1].scatter(X_all[:, 0], X_all[:, 1], c=(y_pred_lof == -1), cmap='coolwarm', s=20) axs[1].set_title("LOF Outlier Detection") axs[1].set_xlabel("Feature 1") axs[1].set_ylabel("Feature 2") plt.tight_layout() plt.show()
copy
question mark

Which statements about Local Outlier Factor (LOF) and Isolation Forest are correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3
some-alt