Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære One-Class SVM: Boundary-Based Detection | Kernel-Based Methods
Outlier and Novelty Detection in Practice

bookOne-Class SVM: Boundary-Based Detection

When you want to identify whether a new observation is unlike anything you have seen before, a powerful approach is to use the One-Class Support Vector Machine (One-Class SVM). This method is particularly well-suited for novelty detection, where you have access only to "normal" data and wish to detect anything that does not conform to this pattern. The algorithm works by learning a decision function that captures the region in feature space where most of the training data lies, effectively building a boundary around the normal observations.

The core idea behind One-Class SVM is to find the smallest region in feature space that encloses most of the data points, separating them from the origin. This is achieved by solving an optimization problem that maximizes the margin between the data and the origin, resulting in a flexible boundary. What makes One-Class SVM especially versatile is its use of the kernel trick: instead of drawing a linear boundary in the original feature space, the algorithm can transform data into a higher-dimensional space using a kernel function. This transformation allows for highly non-linear and complex boundaries, making it possible to capture intricate patterns in the data.

The most common kernels used are the Radial Basis Function (RBF), polynomial, and linear kernels. The choice of kernel, and its parameters, has a significant impact on how tightly or loosely the boundary fits around the data. For example, the RBF kernel can create smooth, rounded boundaries, while a linear kernel results in a straight-line separation.

Note
Note

The νν (nu) parameter in One-Class SVM controls the upper bound on the fraction of outliers and the lower bound on the fraction of support vectors. A higher nu allows more data to be considered as outliers, resulting in a tighter boundary. The γγ (gamma) parameter, relevant for kernels like RBF, determines the influence of single training examples: a small gamma means far-reaching influence (smoother boundary), while a large gamma makes the boundary more sensitive to individual points (more complex, wiggly boundary).

123456789101112131415161718192021222324252627282930313233343536373839
import numpy as np import matplotlib.pyplot as plt from sklearn.svm import OneClassSVM # Generate synthetic 2D normal data rng = np.random.RandomState(42) X = 0.3 * rng.randn(100, 2) X = np.r_[X + 2, X - 2] # Create a grid for plotting decision boundaries xx, yy = np.meshgrid(np.linspace(-4, 4, 500), np.linspace(-4, 4, 500)) grid = np.c_[xx.ravel(), yy.ravel()] params = [ ("RBF, γ=0.1, ν=0.05", 'rbf', 0.1, 0.05), ("RBF, γ=1, ν=0.05", 'rbf', 1, 0.05), ("Linear, ν=0.05", 'linear', 0.1, 0.05), ("Polynomial, γ=0.1, ν=0.05", 'poly', 0.1, 0.05), ] plt.figure(figsize=(12, 10)) for i, (title, kernel, gamma, nu) in enumerate(params): clf = OneClassSVM(kernel=kernel, gamma=gamma, nu=nu) clf.fit(X) Z = clf.decision_function(grid) Z = Z.reshape(xx.shape) plt.subplot(2, 2, i + 1) plt.title(title) # Draw decision boundary and margin plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu) plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred') plt.scatter(X[:, 0], X[:, 1], c='white', s=20, edgecolors='k') plt.xlim(-4, 4) plt.ylim(-4, 4) plt.xticks([]) plt.yticks([]) plt.tight_layout() plt.show()
copy
question mark

Which of the following statements about One-Class SVM parameters is correct?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 5. Kapitel 1

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Awesome!

Completion rate improved to 4.55

bookOne-Class SVM: Boundary-Based Detection

Stryg for at vise menuen

When you want to identify whether a new observation is unlike anything you have seen before, a powerful approach is to use the One-Class Support Vector Machine (One-Class SVM). This method is particularly well-suited for novelty detection, where you have access only to "normal" data and wish to detect anything that does not conform to this pattern. The algorithm works by learning a decision function that captures the region in feature space where most of the training data lies, effectively building a boundary around the normal observations.

The core idea behind One-Class SVM is to find the smallest region in feature space that encloses most of the data points, separating them from the origin. This is achieved by solving an optimization problem that maximizes the margin between the data and the origin, resulting in a flexible boundary. What makes One-Class SVM especially versatile is its use of the kernel trick: instead of drawing a linear boundary in the original feature space, the algorithm can transform data into a higher-dimensional space using a kernel function. This transformation allows for highly non-linear and complex boundaries, making it possible to capture intricate patterns in the data.

The most common kernels used are the Radial Basis Function (RBF), polynomial, and linear kernels. The choice of kernel, and its parameters, has a significant impact on how tightly or loosely the boundary fits around the data. For example, the RBF kernel can create smooth, rounded boundaries, while a linear kernel results in a straight-line separation.

Note
Note

The νν (nu) parameter in One-Class SVM controls the upper bound on the fraction of outliers and the lower bound on the fraction of support vectors. A higher nu allows more data to be considered as outliers, resulting in a tighter boundary. The γγ (gamma) parameter, relevant for kernels like RBF, determines the influence of single training examples: a small gamma means far-reaching influence (smoother boundary), while a large gamma makes the boundary more sensitive to individual points (more complex, wiggly boundary).

123456789101112131415161718192021222324252627282930313233343536373839
import numpy as np import matplotlib.pyplot as plt from sklearn.svm import OneClassSVM # Generate synthetic 2D normal data rng = np.random.RandomState(42) X = 0.3 * rng.randn(100, 2) X = np.r_[X + 2, X - 2] # Create a grid for plotting decision boundaries xx, yy = np.meshgrid(np.linspace(-4, 4, 500), np.linspace(-4, 4, 500)) grid = np.c_[xx.ravel(), yy.ravel()] params = [ ("RBF, γ=0.1, ν=0.05", 'rbf', 0.1, 0.05), ("RBF, γ=1, ν=0.05", 'rbf', 1, 0.05), ("Linear, ν=0.05", 'linear', 0.1, 0.05), ("Polynomial, γ=0.1, ν=0.05", 'poly', 0.1, 0.05), ] plt.figure(figsize=(12, 10)) for i, (title, kernel, gamma, nu) in enumerate(params): clf = OneClassSVM(kernel=kernel, gamma=gamma, nu=nu) clf.fit(X) Z = clf.decision_function(grid) Z = Z.reshape(xx.shape) plt.subplot(2, 2, i + 1) plt.title(title) # Draw decision boundary and margin plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu) plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred') plt.scatter(X[:, 0], X[:, 1], c='white', s=20, edgecolors='k') plt.xlim(-4, 4) plt.ylim(-4, 4) plt.xticks([]) plt.yticks([]) plt.tight_layout() plt.show()
copy
question mark

Which of the following statements about One-Class SVM parameters is correct?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 5. Kapitel 1
some-alt