One-Class SVM: Boundary-Based Detection
When you want to identify whether a new observation is unlike anything you have seen before, a powerful approach is to use the One-Class Support Vector Machine (One-Class SVM). This method is particularly well-suited for novelty detection, where you have access only to "normal" data and wish to detect anything that does not conform to this pattern. The algorithm works by learning a decision function that captures the region in feature space where most of the training data lies, effectively building a boundary around the normal observations.
The core idea behind One-Class SVM is to find the smallest region in feature space that encloses most of the data points, separating them from the origin. This is achieved by solving an optimization problem that maximizes the margin between the data and the origin, resulting in a flexible boundary. What makes One-Class SVM especially versatile is its use of the kernel trick: instead of drawing a linear boundary in the original feature space, the algorithm can transform data into a higher-dimensional space using a kernel function. This transformation allows for highly non-linear and complex boundaries, making it possible to capture intricate patterns in the data.
The most common kernels used are the Radial Basis Function (RBF), polynomial, and linear kernels. The choice of kernel, and its parameters, has a significant impact on how tightly or loosely the boundary fits around the data. For example, the RBF kernel can create smooth, rounded boundaries, while a linear kernel results in a straight-line separation.
The ν (nu) parameter in One-Class SVM controls the upper bound on the fraction of outliers and the lower bound on the fraction of support vectors. A higher nu allows more data to be considered as outliers, resulting in a tighter boundary. The γ (gamma) parameter, relevant for kernels like RBF, determines the influence of single training examples: a small gamma means far-reaching influence (smoother boundary), while a large gamma makes the boundary more sensitive to individual points (more complex, wiggly boundary).
123456789101112131415161718192021222324252627282930313233343536373839import numpy as np import matplotlib.pyplot as plt from sklearn.svm import OneClassSVM # Generate synthetic 2D normal data rng = np.random.RandomState(42) X = 0.3 * rng.randn(100, 2) X = np.r_[X + 2, X - 2] # Create a grid for plotting decision boundaries xx, yy = np.meshgrid(np.linspace(-4, 4, 500), np.linspace(-4, 4, 500)) grid = np.c_[xx.ravel(), yy.ravel()] params = [ ("RBF, γ=0.1, ν=0.05", 'rbf', 0.1, 0.05), ("RBF, γ=1, ν=0.05", 'rbf', 1, 0.05), ("Linear, ν=0.05", 'linear', 0.1, 0.05), ("Polynomial, γ=0.1, ν=0.05", 'poly', 0.1, 0.05), ] plt.figure(figsize=(12, 10)) for i, (title, kernel, gamma, nu) in enumerate(params): clf = OneClassSVM(kernel=kernel, gamma=gamma, nu=nu) clf.fit(X) Z = clf.decision_function(grid) Z = Z.reshape(xx.shape) plt.subplot(2, 2, i + 1) plt.title(title) # Draw decision boundary and margin plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu) plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred') plt.scatter(X[:, 0], X[:, 1], c='white', s=20, edgecolors='k') plt.xlim(-4, 4) plt.ylim(-4, 4) plt.xticks([]) plt.yticks([]) plt.tight_layout() plt.show()
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Awesome!
Completion rate improved to 4.55
One-Class SVM: Boundary-Based Detection
Свайпніть щоб показати меню
When you want to identify whether a new observation is unlike anything you have seen before, a powerful approach is to use the One-Class Support Vector Machine (One-Class SVM). This method is particularly well-suited for novelty detection, where you have access only to "normal" data and wish to detect anything that does not conform to this pattern. The algorithm works by learning a decision function that captures the region in feature space where most of the training data lies, effectively building a boundary around the normal observations.
The core idea behind One-Class SVM is to find the smallest region in feature space that encloses most of the data points, separating them from the origin. This is achieved by solving an optimization problem that maximizes the margin between the data and the origin, resulting in a flexible boundary. What makes One-Class SVM especially versatile is its use of the kernel trick: instead of drawing a linear boundary in the original feature space, the algorithm can transform data into a higher-dimensional space using a kernel function. This transformation allows for highly non-linear and complex boundaries, making it possible to capture intricate patterns in the data.
The most common kernels used are the Radial Basis Function (RBF), polynomial, and linear kernels. The choice of kernel, and its parameters, has a significant impact on how tightly or loosely the boundary fits around the data. For example, the RBF kernel can create smooth, rounded boundaries, while a linear kernel results in a straight-line separation.
The ν (nu) parameter in One-Class SVM controls the upper bound on the fraction of outliers and the lower bound on the fraction of support vectors. A higher nu allows more data to be considered as outliers, resulting in a tighter boundary. The γ (gamma) parameter, relevant for kernels like RBF, determines the influence of single training examples: a small gamma means far-reaching influence (smoother boundary), while a large gamma makes the boundary more sensitive to individual points (more complex, wiggly boundary).
123456789101112131415161718192021222324252627282930313233343536373839import numpy as np import matplotlib.pyplot as plt from sklearn.svm import OneClassSVM # Generate synthetic 2D normal data rng = np.random.RandomState(42) X = 0.3 * rng.randn(100, 2) X = np.r_[X + 2, X - 2] # Create a grid for plotting decision boundaries xx, yy = np.meshgrid(np.linspace(-4, 4, 500), np.linspace(-4, 4, 500)) grid = np.c_[xx.ravel(), yy.ravel()] params = [ ("RBF, γ=0.1, ν=0.05", 'rbf', 0.1, 0.05), ("RBF, γ=1, ν=0.05", 'rbf', 1, 0.05), ("Linear, ν=0.05", 'linear', 0.1, 0.05), ("Polynomial, γ=0.1, ν=0.05", 'poly', 0.1, 0.05), ] plt.figure(figsize=(12, 10)) for i, (title, kernel, gamma, nu) in enumerate(params): clf = OneClassSVM(kernel=kernel, gamma=gamma, nu=nu) clf.fit(X) Z = clf.decision_function(grid) Z = Z.reshape(xx.shape) plt.subplot(2, 2, i + 1) plt.title(title) # Draw decision boundary and margin plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu) plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred') plt.scatter(X[:, 0], X[:, 1], c='white', s=20, edgecolors='k') plt.xlim(-4, 4) plt.ylim(-4, 4) plt.xticks([]) plt.yticks([]) plt.tight_layout() plt.show()
Дякуємо за ваш відгук!