Dense vs Sparse Regions: Visualization
The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:
- Compares the density around each point to the density of its neighbors;
- Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
- Treats points in dense regionsβeven those far from the global centerβas normal if their neighbors are similarly dense.
This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.
1234567891011121314151617181920212223242526272829303132import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import LocalOutlierFactor # Create a dataset with a dense cluster and a sparse cluster np.random.seed(42) dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2]) sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7]) X = np.vstack([dense_cluster, sparse_cluster]) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20) lof_scores = -lof.fit_predict(X) lof_factors = lof.negative_outlier_factor_ # Normalize LOF scores for color mapping norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min()) plt.figure(figsize=(8, 6)) scatter = plt.scatter( X[:, 0], X[:, 1], c=norm_scores, cmap="coolwarm_r", s=60, edgecolor="k" ) plt.colorbar(scatter, label="Normalized LOF Score") plt.title("Dense vs Sparse Regions: LOF Outlier Detection") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.show()
LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliersβeven if they are togetherβbecause their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 4.55
Dense vs Sparse Regions: Visualization
Swipe to show menu
The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:
- Compares the density around each point to the density of its neighbors;
- Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
- Treats points in dense regionsβeven those far from the global centerβas normal if their neighbors are similarly dense.
This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.
1234567891011121314151617181920212223242526272829303132import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import LocalOutlierFactor # Create a dataset with a dense cluster and a sparse cluster np.random.seed(42) dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2]) sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7]) X = np.vstack([dense_cluster, sparse_cluster]) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20) lof_scores = -lof.fit_predict(X) lof_factors = lof.negative_outlier_factor_ # Normalize LOF scores for color mapping norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min()) plt.figure(figsize=(8, 6)) scatter = plt.scatter( X[:, 0], X[:, 1], c=norm_scores, cmap="coolwarm_r", s=60, edgecolor="k" ) plt.colorbar(scatter, label="Normalized LOF Score") plt.title("Dense vs Sparse Regions: LOF Outlier Detection") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.show()
LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliersβeven if they are togetherβbecause their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.
Thanks for your feedback!