Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Local Outlier Factor: Density Deviation | Density-Based Methods
Outlier and Novelty Detection in Practice

bookLocal Outlier Factor: Density Deviation

The Local Outlier Factor (LOF) is a density-based method for detecting outliers by comparing the local density of a data point to the densities of its neighbors. Unlike global approaches, LOF focuses on how isolated a point is with respect to its local neighborhood.

LOF builds on two key concepts:

  • Reachability distance;
  • Local density deviation.

To understand LOF, start with a set of data points in a feature space. For each point, you identify its k nearest neighbors.

The reachability distance between a point A and its neighbor B is defined as the maximum of:

  • The distance between A and B;
  • The distance from B to its kth nearest neighbor.

This definition ensures that points inside dense clusters have similar reachability distances, while points in sparse regions stand out.

After computing reachability distances, calculate the local reachability density for each point. This is the inverse of the average reachability distance from the point to its neighbors.

The LOF score for a point is the ratio of the average local reachability density of its neighbors to its own local reachability density:

  • A score close to 1 means the point has similar density to its neighbors;
  • A score much greater than 1 means the point is in a sparser region, making it a potential outlier.
Note
Note

The intuition behind LOF is that outliers are not just points that are far from others, but points that have substantially lower local density compared to their neighbors. By comparing each point's local density to those around it, LOF can flag subtle anomalies that global methods might miss—such as points on the edge of a cluster or in a sparse pocket of the data space.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import NearestNeighbors # Generate a small set of 2D points X = np.array([ [1, 1], [1.2, 1.1], [0.8, 1.0], [1.1, 0.9], [5, 5] ]) k = 2 # Number of neighbors # Fit NearestNeighbors model nbrs = NearestNeighbors(n_neighbors=k+1).fit(X) distances, indices = nbrs.kneighbors(X) # Compute reachability distances reachability_distances = [] for i, neighbors in enumerate(indices): point_reach_dists = [] for neighbor_idx in neighbors[1:]: # Exclude itself d = np.linalg.norm(X[i] - X[neighbor_idx]) k_dist_neighbor = distances[neighbor_idx][k] reach_dist = max(d, k_dist_neighbor) point_reach_dists.append(reach_dist) reachability_distances.append(point_reach_dists) # Visualize points and reachability distances plt.figure(figsize=(6, 6)) plt.scatter(X[:, 0], X[:, 1], c='blue', label='Points') for i, (x, y) in enumerate(X): plt.text(x + 0.05, y + 0.05, f"P{i}", fontsize=9) for j, neighbor_idx in enumerate(indices[i][1:]): nx, ny = X[neighbor_idx] plt.plot([x, nx], [y, ny], 'k--', linewidth=0.8) mid_x, mid_y = (x + nx) / 2, (y + ny) / 2 plt.text(mid_x, mid_y, f"{reachability_distances[i][j]:.2f}", color='red', fontsize=8) plt.legend() plt.title("Reachability Distances (k=2)") plt.xlabel("X1") plt.ylabel("X2") plt.grid(True) plt.show()
copy
question mark

Which statement best describes how the LOF score is interpreted?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 4. Kapitel 1

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

Can you explain how the reachability distance is calculated in this example?

What does the scatter plot reveal about potential outliers in the data?

How does changing the value of k affect the reachability distances and the LOF calculation?

Awesome!

Completion rate improved to 4.55

bookLocal Outlier Factor: Density Deviation

Svep för att visa menyn

The Local Outlier Factor (LOF) is a density-based method for detecting outliers by comparing the local density of a data point to the densities of its neighbors. Unlike global approaches, LOF focuses on how isolated a point is with respect to its local neighborhood.

LOF builds on two key concepts:

  • Reachability distance;
  • Local density deviation.

To understand LOF, start with a set of data points in a feature space. For each point, you identify its k nearest neighbors.

The reachability distance between a point A and its neighbor B is defined as the maximum of:

  • The distance between A and B;
  • The distance from B to its kth nearest neighbor.

This definition ensures that points inside dense clusters have similar reachability distances, while points in sparse regions stand out.

After computing reachability distances, calculate the local reachability density for each point. This is the inverse of the average reachability distance from the point to its neighbors.

The LOF score for a point is the ratio of the average local reachability density of its neighbors to its own local reachability density:

  • A score close to 1 means the point has similar density to its neighbors;
  • A score much greater than 1 means the point is in a sparser region, making it a potential outlier.
Note
Note

The intuition behind LOF is that outliers are not just points that are far from others, but points that have substantially lower local density compared to their neighbors. By comparing each point's local density to those around it, LOF can flag subtle anomalies that global methods might miss—such as points on the edge of a cluster or in a sparse pocket of the data space.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import NearestNeighbors # Generate a small set of 2D points X = np.array([ [1, 1], [1.2, 1.1], [0.8, 1.0], [1.1, 0.9], [5, 5] ]) k = 2 # Number of neighbors # Fit NearestNeighbors model nbrs = NearestNeighbors(n_neighbors=k+1).fit(X) distances, indices = nbrs.kneighbors(X) # Compute reachability distances reachability_distances = [] for i, neighbors in enumerate(indices): point_reach_dists = [] for neighbor_idx in neighbors[1:]: # Exclude itself d = np.linalg.norm(X[i] - X[neighbor_idx]) k_dist_neighbor = distances[neighbor_idx][k] reach_dist = max(d, k_dist_neighbor) point_reach_dists.append(reach_dist) reachability_distances.append(point_reach_dists) # Visualize points and reachability distances plt.figure(figsize=(6, 6)) plt.scatter(X[:, 0], X[:, 1], c='blue', label='Points') for i, (x, y) in enumerate(X): plt.text(x + 0.05, y + 0.05, f"P{i}", fontsize=9) for j, neighbor_idx in enumerate(indices[i][1:]): nx, ny = X[neighbor_idx] plt.plot([x, nx], [y, ny], 'k--', linewidth=0.8) mid_x, mid_y = (x + nx) / 2, (y + ny) / 2 plt.text(mid_x, mid_y, f"{reachability_distances[i][j]:.2f}", color='red', fontsize=8) plt.legend() plt.title("Reachability Distances (k=2)") plt.xlabel("X1") plt.ylabel("X2") plt.grid(True) plt.show()
copy
question mark

Which statement best describes how the LOF score is interpreted?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 4. Kapitel 1
some-alt