Learn Comparing LOF and Isolation Forest

Local Outlier Factor (LOF) and Isolation Forest are two widely used algorithms for outlier detection, each with distinct strengths and assumptions.

LOF measures the local density of each point compared to its neighbors. Outliers are points with much lower local density. LOF is effective when data contains clusters of varying density, as it highlights points that are unusual in their immediate neighborhood.
Isolation Forest isolates data points using random splits. Outliers are easier to isolate, so they require fewer splits. This method does not rely on distance or density, making it efficient for high-dimensional data and large datasets.

Summary of use cases:

Use LOF when you expect local density variations and need to find outliers relative to their surroundings;
Choose Isolation Forest for large or high-dimensional datasets, or when you need a scalable method less affected by the curse of dimensionality.

Note

In practice, LOF tends to outperform Isolation Forest when the dataset contains clusters of varying densities, as LOF can detect outliers that are only anomalous within their local context. However, Isolation Forest is more robust and efficient for high-dimensional data or when outliers are globally distinct rather than locally rare. For time-series or streaming data, Isolation Forest's speed and scalability make it a better choice, while LOF may struggle due to its reliance on local neighborhoods.


              1234567891011121314151617181920212223242526272829303132333435
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

# Generate synthetic data with two clusters and some outliers
X, _ = make_blobs(n_samples=300, centers=[[0, 0], [5, 5]], cluster_std=[0.8, 1.0], random_state=42)
rng = np.random.RandomState(42)
X_outliers = rng.uniform(low=-6, high=10, size=(20, 2))
X_all = np.vstack([X, X_outliers])

# Fit Isolation Forest
iso_forest = IsolationForest(contamination=0.06, random_state=42)
y_pred_iso = iso_forest.fit_predict(X_all)

# Fit LOF
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.06)
y_pred_lof = lof.fit_predict(X_all)

# Visualize results
fig, axs = plt.subplots(1, 2, figsize=(12, 5))

axs[0].scatter(X_all[:, 0], X_all[:, 1], c=(y_pred_iso == -1), cmap='coolwarm', s=20)
axs[0].set_title("Isolation Forest Outlier Detection")
axs[0].set_xlabel("Feature 1")
axs[0].set_ylabel("Feature 2")

axs[1].scatter(X_all[:, 0], X_all[:, 1], c=(y_pred_lof == -1), cmap='coolwarm', s=20)
axs[1].set_title("LOF Outlier Detection")
axs[1].set_xlabel("Feature 1")
axs[1].set_ylabel("Feature 2")

plt.tight_layout()
plt.show()

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain the main differences between LOF and Isolation Forest in more detail?

How do I interpret the results of the scatter plots for outlier detection?

When should I choose LOF over Isolation Forest for my dataset?

Swipe to show menu