Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Dense vs Sparse Regions: Visualization | Density-Based Methods
Outlier and Novelty Detection in Practice

bookDense vs Sparse Regions: Visualization

The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:

  • Compares the density around each point to the density of its neighbors;
  • Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
  • Treats points in dense regions—even those far from the global center—as normal if their neighbors are similarly dense.

This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.

1234567891011121314151617181920212223242526272829303132
import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import LocalOutlierFactor # Create a dataset with a dense cluster and a sparse cluster np.random.seed(42) dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2]) sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7]) X = np.vstack([dense_cluster, sparse_cluster]) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20) lof_scores = -lof.fit_predict(X) lof_factors = lof.negative_outlier_factor_ # Normalize LOF scores for color mapping norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min()) plt.figure(figsize=(8, 6)) scatter = plt.scatter( X[:, 0], X[:, 1], c=norm_scores, cmap="coolwarm_r", s=60, edgecolor="k" ) plt.colorbar(scatter, label="Normalized LOF Score") plt.title("Dense vs Sparse Regions: LOF Outlier Detection") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.show()
copy
Note
Note

LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliers—even if they are together—because their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.

question mark

Which statement best describes how LOF identifies outliers in a dataset with both dense and sparse clusters?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 4. Hoofdstuk 2

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Awesome!

Completion rate improved to 4.55

bookDense vs Sparse Regions: Visualization

Veeg om het menu te tonen

The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:

  • Compares the density around each point to the density of its neighbors;
  • Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
  • Treats points in dense regions—even those far from the global center—as normal if their neighbors are similarly dense.

This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.

1234567891011121314151617181920212223242526272829303132
import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import LocalOutlierFactor # Create a dataset with a dense cluster and a sparse cluster np.random.seed(42) dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2]) sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7]) X = np.vstack([dense_cluster, sparse_cluster]) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20) lof_scores = -lof.fit_predict(X) lof_factors = lof.negative_outlier_factor_ # Normalize LOF scores for color mapping norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min()) plt.figure(figsize=(8, 6)) scatter = plt.scatter( X[:, 0], X[:, 1], c=norm_scores, cmap="coolwarm_r", s=60, edgecolor="k" ) plt.colorbar(scatter, label="Normalized LOF Score") plt.title("Dense vs Sparse Regions: LOF Outlier Detection") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.show()
copy
Note
Note

LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliers—even if they are together—because their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.

question mark

Which statement best describes how LOF identifies outliers in a dataset with both dense and sparse clusters?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 4. Hoofdstuk 2
some-alt