Dense vs Sparse Regions: Visualization
The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:
- Compares the density around each point to the density of its neighbors;
- Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
- Treats points in dense regions—even those far from the global center—as normal if their neighbors are similarly dense.
This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.
1234567891011121314151617181920212223242526272829303132import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import LocalOutlierFactor # Create a dataset with a dense cluster and a sparse cluster np.random.seed(42) dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2]) sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7]) X = np.vstack([dense_cluster, sparse_cluster]) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20) lof_scores = -lof.fit_predict(X) lof_factors = lof.negative_outlier_factor_ # Normalize LOF scores for color mapping norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min()) plt.figure(figsize=(8, 6)) scatter = plt.scatter( X[:, 0], X[:, 1], c=norm_scores, cmap="coolwarm_r", s=60, edgecolor="k" ) plt.colorbar(scatter, label="Normalized LOF Score") plt.title("Dense vs Sparse Regions: LOF Outlier Detection") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.show()
LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliers—even if they are together—because their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Awesome!
Completion rate improved to 4.55
Dense vs Sparse Regions: Visualization
Scorri per mostrare il menu
The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:
- Compares the density around each point to the density of its neighbors;
- Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
- Treats points in dense regions—even those far from the global center—as normal if their neighbors are similarly dense.
This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.
1234567891011121314151617181920212223242526272829303132import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import LocalOutlierFactor # Create a dataset with a dense cluster and a sparse cluster np.random.seed(42) dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2]) sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7]) X = np.vstack([dense_cluster, sparse_cluster]) # Fit LOF lof = LocalOutlierFactor(n_neighbors=20) lof_scores = -lof.fit_predict(X) lof_factors = lof.negative_outlier_factor_ # Normalize LOF scores for color mapping norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min()) plt.figure(figsize=(8, 6)) scatter = plt.scatter( X[:, 0], X[:, 1], c=norm_scores, cmap="coolwarm_r", s=60, edgecolor="k" ) plt.colorbar(scatter, label="Normalized LOF Score") plt.title("Dense vs Sparse Regions: LOF Outlier Detection") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.show()
LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliers—even if they are together—because their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.
Grazie per i tuoi commenti!