Impara Dense vs Sparse Regions: Visualization

The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:

Compares the density around each point to the density of its neighbors;
Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
Treats points in dense regions—even those far from the global center—as normal if their neighbors are similarly dense.

This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.


              1234567891011121314151617181920212223242526272829303132
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import LocalOutlierFactor

# Create a dataset with a dense cluster and a sparse cluster
np.random.seed(42)
dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2])
sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7])
X = np.vstack([dense_cluster, sparse_cluster])

# Fit LOF
lof = LocalOutlierFactor(n_neighbors=20)
lof_scores = -lof.fit_predict(X)
lof_factors = lof.negative_outlier_factor_

# Normalize LOF scores for color mapping
norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min())

plt.figure(figsize=(8, 6))
scatter = plt.scatter(
    X[:, 0], X[:, 1],
    c=norm_scores,
    cmap="coolwarm_r",
    s=60,
    edgecolor="k"
)
plt.colorbar(scatter, label="Normalized LOF Score")
plt.title("Dense vs Sparse Regions: LOF Outlier Detection")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()

Note

LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliers—even if they are together—because their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 2

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

Can you explain how to interpret the LOF scores in the plot?

What does the color mapping in the scatter plot represent?

How does LOF handle datasets with more than two clusters?

Scorri per mostrare il menu