Вивчайте One-Class SVM: Boundary-Based Detection

When you want to identify whether a new observation is unlike anything you have seen before, a powerful approach is to use the One-Class Support Vector Machine (One-Class SVM). This method is particularly well-suited for novelty detection, where you have access only to "normal" data and wish to detect anything that does not conform to this pattern. The algorithm works by learning a decision function that captures the region in feature space where most of the training data lies, effectively building a boundary around the normal observations.

The core idea behind One-Class SVM is to find the smallest region in feature space that encloses most of the data points, separating them from the origin. This is achieved by solving an optimization problem that maximizes the margin between the data and the origin, resulting in a flexible boundary. What makes One-Class SVM especially versatile is its use of the kernel trick: instead of drawing a linear boundary in the original feature space, the algorithm can transform data into a higher-dimensional space using a kernel function. This transformation allows for highly non-linear and complex boundaries, making it possible to capture intricate patterns in the data.

The most common kernels used are the Radial Basis Function (RBF), polynomial, and linear kernels. The choice of kernel, and its parameters, has a significant impact on how tightly or loosely the boundary fits around the data. For example, the RBF kernel can create smooth, rounded boundaries, while a linear kernel results in a straight-line separation.

Note

The $ν$ (nu) parameter in One-Class SVM controls the upper bound on the fraction of outliers and the lower bound on the fraction of support vectors. A higher nu allows more data to be considered as outliers, resulting in a tighter boundary. The $γ$ (gamma) parameter, relevant for kernels like RBF, determines the influence of single training examples: a small gamma means far-reaching influence (smoother boundary), while a large gamma makes the boundary more sensitive to individual points (more complex, wiggly boundary).


              123456789101112131415161718192021222324252627282930313233343536373839
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import OneClassSVM

# Generate synthetic 2D normal data
rng = np.random.RandomState(42)
X = 0.3 * rng.randn(100, 2)
X = np.r_[X + 2, X - 2]

# Create a grid for plotting decision boundaries
xx, yy = np.meshgrid(np.linspace(-4, 4, 500), np.linspace(-4, 4, 500))
grid = np.c_[xx.ravel(), yy.ravel()]

params = [
    ("RBF, γ=0.1, ν=0.05", 'rbf', 0.1, 0.05),
    ("RBF, γ=1, ν=0.05", 'rbf', 1, 0.05),
    ("Linear, ν=0.05", 'linear', 0.1, 0.05),
    ("Polynomial, γ=0.1, ν=0.05", 'poly', 0.1, 0.05),
]

plt.figure(figsize=(12, 10))
for i, (title, kernel, gamma, nu) in enumerate(params):
    clf = OneClassSVM(kernel=kernel, gamma=gamma, nu=nu)
    clf.fit(X)
    Z = clf.decision_function(grid)
    Z = Z.reshape(xx.shape)

    plt.subplot(2, 2, i + 1)
    plt.title(title)
    # Draw decision boundary and margin
    plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
    plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
    plt.scatter(X[:, 0], X[:, 1], c='white', s=20, edgecolors='k')
    plt.xlim(-4, 4)
    plt.ylim(-4, 4)
    plt.xticks([])
    plt.yticks([])
plt.tight_layout()
plt.show()

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 5. Розділ 1

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain what the parameters gamma and nu mean in One-Class SVM?

How do I interpret the decision boundaries shown in the plots?

What are some practical applications of One-Class SVM for novelty detection?

Свайпніть щоб показати меню

Note


              123456789101112131415161718192021222324252627282930313233343536373839
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import OneClassSVM

# Generate synthetic 2D normal data
rng = np.random.RandomState(42)
X = 0.3 * rng.randn(100, 2)
X = np.r_[X + 2, X - 2]

# Create a grid for plotting decision boundaries
xx, yy = np.meshgrid(np.linspace(-4, 4, 500), np.linspace(-4, 4, 500))
grid = np.c_[xx.ravel(), yy.ravel()]

params = [
    ("RBF, γ=0.1, ν=0.05", 'rbf', 0.1, 0.05),
    ("RBF, γ=1, ν=0.05", 'rbf', 1, 0.05),
    ("Linear, ν=0.05", 'linear', 0.1, 0.05),
    ("Polynomial, γ=0.1, ν=0.05", 'poly', 0.1, 0.05),
]

plt.figure(figsize=(12, 10))
for i, (title, kernel, gamma, nu) in enumerate(params):
    clf = OneClassSVM(kernel=kernel, gamma=gamma, nu=nu)
    clf.fit(X)
    Z = clf.decision_function(grid)
    Z = Z.reshape(xx.shape)

    plt.subplot(2, 2, i + 1)
    plt.title(title)
    # Draw decision boundary and margin
    plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
    plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
    plt.scatter(X[:, 0], X[:, 1], c='white', s=20, edgecolors='k')
    plt.xlim(-4, 4)
    plt.ylim(-4, 4)
    plt.xticks([])
    plt.yticks([])
plt.tight_layout()
plt.show()

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 5. Розділ 1