Leer Dense vs Sparse Regions: Visualization

Veeg om het menu te tonen

The Local Outlier Factor (LOF) algorithm identifies outliers based on the concept of local density. LOF does not simply flag points that are far from the center of the data. Instead, it:

Compares the density around each point to the density of its neighbors;
Flags outliers in sparse regions, where the local density is much lower than that of the surrounding points;
Treats points in dense regions—even those far from the global center—as normal if their neighbors are similarly dense.

This local approach enables LOF to adapt to datasets with clusters of varying densities. It is especially powerful for real-world data, where uniformity cannot be assumed.


              1234567891011121314151617181920212223242526272829303132
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import LocalOutlierFactor

# Create a dataset with a dense cluster and a sparse cluster
np.random.seed(42)
dense_cluster = 0.3 * np.random.randn(100, 2) + np.array([2, 2])
sparse_cluster = 1.0 * np.random.randn(20, 2) + np.array([7, 7])
X = np.vstack([dense_cluster, sparse_cluster])

# Fit LOF
lof = LocalOutlierFactor(n_neighbors=20)
lof_scores = -lof.fit_predict(X)
lof_factors = lof.negative_outlier_factor_

# Normalize LOF scores for color mapping
norm_scores = (lof_factors - lof_factors.min()) / (lof_factors.max() - lof_factors.min())

plt.figure(figsize=(8, 6))
scatter = plt.scatter(
    X[:, 0], X[:, 1],
    c=norm_scores,
    cmap="coolwarm_r",
    s=60,
    edgecolor="k"
)
plt.colorbar(scatter, label="Normalized LOF Score")
plt.title("Dense vs Sparse Regions: LOF Outlier Detection")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()

Note

LOF adapts to local structure by comparing the density around each point to that of its neighbors. In the visualization, points in the sparse cluster have higher LOF scores, indicating they are more likely to be outliers—even if they are together—because their local density is much lower than in the dense cluster. LOF does not penalize points simply for being far from the global center; it focuses on how isolated a point is relative to its immediate surroundings.

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 4. Hoofdstuk 2

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 4. Hoofdstuk 2