Apprendre Intuition: Separating Normal Data in Feature Space

To understand how One-Class SVM separates normal from novel data, imagine mapping your input data into a high-dimensional feature space using a kernel function, such as the radial basis function (RBF). In this transformed space, the algorithm constructs a flexible boundary that tightly encloses the majority of normal data points, treating anything outside as a potential anomaly. This boundary is not a simple geometric shape like a circle or rectangle, but adapts to the underlying distribution and spread of your data. The SVM learns this boundary by maximizing the margin between the bulk of the data and the origin, effectively distinguishing between what is considered "normal" and what lies outside as "novel".


              1234567891011121314151617181920212223242526272829303132333435363738
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import OneClassSVM

# Generate normal data
rng = np.random.RandomState(42)
X = 0.3 * rng.randn(100, 2)
X_train = np.r_[X + 2, X - 2]

# Generate some novel (outlier) points
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))

# Fit the One-Class SVM
clf = OneClassSVM(kernel="rbf", gamma=0.5, nu=0.1)
clf.fit(X_train)

# Create a grid for plotting
xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.figure(figsize=(8, 8))
# Plot decision boundary and margin
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
# Plot training points
plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=20, edgecolor='k', label="Normal")
# Plot outliers
plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=20, edgecolor='k', label="Novel")
# Highlight support vectors
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100, facecolors='none', edgecolors='navy', label="Support Vectors")
plt.legend()
plt.title("One-Class SVM with RBF Kernel: Decision Boundary and Support Vectors")
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Note

The boundary produced by One-Class SVM is highly flexible due to the RBF kernel. It can adapt to different data shapes, wrapping tightly around clusters or stretching to enclose elongated distributions. This adaptability means the SVM is sensitive to the true structure of your data, but also to parameter choices like gamma and nu that control how closely the boundary follows the data.

Tout était clair ?

Merci pour vos commentaires !

Section 5. Chapitre 2

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Suggested prompts:

Can you explain what the parameters `gamma` and `nu` mean in the One-Class SVM?

How does the One-Class SVM decide which points are support vectors?

Can you describe how the decision boundary adapts to different data distributions?

Glissez pour afficher le menu


              1234567891011121314151617181920212223242526272829303132333435363738
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import OneClassSVM

# Generate normal data
rng = np.random.RandomState(42)
X = 0.3 * rng.randn(100, 2)
X_train = np.r_[X + 2, X - 2]

# Generate some novel (outlier) points
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))

# Fit the One-Class SVM
clf = OneClassSVM(kernel="rbf", gamma=0.5, nu=0.1)
clf.fit(X_train)

# Create a grid for plotting
xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.figure(figsize=(8, 8))
# Plot decision boundary and margin
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
# Plot training points
plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=20, edgecolor='k', label="Normal")
# Plot outliers
plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=20, edgecolor='k', label="Novel")
# Highlight support vectors
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100, facecolors='none', edgecolors='navy', label="Support Vectors")
plt.legend()
plt.title("One-Class SVM with RBF Kernel: Decision Boundary and Support Vectors")
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Note

Tout était clair ?

Merci pour vos commentaires !

Section 5. Chapitre 2