Local Outlier Factor: Density Deviation
Svep för att visa menyn
The Local Outlier Factor (LOF) is a density-based method for detecting outliers by comparing the local density of a data point to the densities of its neighbors. Unlike global approaches, LOF focuses on how isolated a point is with respect to its local neighborhood.
LOF builds on two key concepts:
- Reachability distance;
- Local density deviation.
To understand LOF, start with a set of data points in a feature space. For each point, you identify its k nearest neighbors.
The reachability distance between a point A and its neighbor B is defined as the maximum of:
- The distance between
AandB; - The distance from
Bto itskth nearest neighbor.
This definition ensures that points inside dense clusters have similar reachability distances, while points in sparse regions stand out.
After computing reachability distances, calculate the local reachability density for each point. This is the inverse of the average reachability distance from the point to its neighbors.
The LOF score for a point is the ratio of the average local reachability density of its neighbors to its own local reachability density:
- A score close to
1means the point has similar density to its neighbors; - A score much greater than
1means the point is in a sparser region, making it a potential outlier.
The intuition behind LOF is that outliers are not just points that are far from others, but points that have substantially lower local density compared to their neighbors. By comparing each point's local density to those around it, LOF can flag subtle anomalies that global methods might miss—such as points on the edge of a cluster or in a sparse pocket of the data space.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import NearestNeighbors # Adjusted coordinates: # P0-P3 are still a "cluster" (distances ~0.4 - 0.6) # P4 is the outlier (distance > 3.0) # We spread P0-P3 out visually to allow space for text labels X = np.array([ [1.0, 1.0], # P0 [1.4, 1.3], # P1 (moved slightly right-up) [0.6, 1.0], # P2 (moved left) [1.0, 0.5], # P3 (moved down) [4.0, 4.0] # P4 (Outlier) ]) k = 2 nbrs = NearestNeighbors(n_neighbors=k+1).fit(X) distances, indices = nbrs.kneighbors(X) reachability_distances = [] for i, neighbors in enumerate(indices): point_reach_dists = [] for neighbor_idx in neighbors[1:]: d = np.linalg.norm(X[i] - X[neighbor_idx]) k_dist_neighbor = distances[neighbor_idx][k] reach_dist = max(d, k_dist_neighbor) point_reach_dists.append(reach_dist) reachability_distances.append(point_reach_dists) plt.figure(figsize=(10, 8)) # Increased figure size plt.scatter(X[:, 0], X[:, 1], c='blue', s=100, label='Points', zorder=3) for i, (x, y) in enumerate(X): # Label points with a small offset plt.text(x, y + 0.15, f"P{i}", fontsize=12, fontweight='bold', ha='center', zorder=4) for j, neighbor_idx in enumerate(indices[i][1:]): nx, ny = X[neighbor_idx] # Draw lines # Red dashed for outlier connections, Grey for cluster if i == 4 or neighbor_idx == 4: color = 'red' style = '--' width = 1.0 else: color = 'gray' style = ':' width = 0.8 plt.plot([x, nx], [y, ny], color=color, linestyle=style, linewidth=width, zorder=1) # Calculate midpoint for text mid_x, mid_y = (x + nx) / 2, (y + ny) / 2 # Calculate reachability distance value val = reachability_distances[i][j] # Smart offset for text to avoid overlap on the lines # We push text slightly "up" or "down" based on the index to separate them text_offset_y = 0.05 if j == 0 else -0.05 plt.text(mid_x, mid_y + text_offset_y, f"{val:.2f}", color='black', fontsize=9, bbox=dict(facecolor='white', alpha=0.85, edgecolor='#cccccc', boxstyle='round,pad=0.2'), ha='center', va='center', zorder=2) plt.legend(loc='upper left') plt.title(f"Reachability Distances Visualization (k={k})") plt.xlabel("X1") plt.ylabel("X2") plt.grid(True, linestyle='--', alpha=0.5) # Set limits to focus on the data but leave breathing room plt.xlim(0, 4.5) plt.ylim(0, 4.5) plt.show()
Tack för dina kommentarer!
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal