Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Intuition: Separating Normal Data in Feature Space | Kernel-Based Methods
Outlier and Novelty Detection in Practice

bookIntuition: Separating Normal Data in Feature Space

To understand how One-Class SVM separates normal from novel data, imagine mapping your input data into a high-dimensional feature space using a kernel function, such as the radial basis function (RBF). In this transformed space, the algorithm constructs a flexible boundary that tightly encloses the majority of normal data points, treating anything outside as a potential anomaly. This boundary is not a simple geometric shape like a circle or rectangle, but adapts to the underlying distribution and spread of your data. The SVM learns this boundary by maximizing the margin between the bulk of the data and the origin, effectively distinguishing between what is considered "normal" and what lies outside as "novel".

1234567891011121314151617181920212223242526272829303132333435363738
import numpy as np import matplotlib.pyplot as plt from sklearn.svm import OneClassSVM # Generate normal data rng = np.random.RandomState(42) X = 0.3 * rng.randn(100, 2) X_train = np.r_[X + 2, X - 2] # Generate some novel (outlier) points X_outliers = rng.uniform(low=-4, high=4, size=(20, 2)) # Fit the One-Class SVM clf = OneClassSVM(kernel="rbf", gamma=0.5, nu=0.1) clf.fit(X_train) # Create a grid for plotting xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500)) Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.figure(figsize=(8, 8)) # Plot decision boundary and margin plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu) a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred') # Plot training points plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=20, edgecolor='k', label="Normal") # Plot outliers plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=20, edgecolor='k', label="Novel") # Highlight support vectors plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100, facecolors='none', edgecolors='navy', label="Support Vectors") plt.legend() plt.title("One-Class SVM with RBF Kernel: Decision Boundary and Support Vectors") plt.xlim((-5, 5)) plt.ylim((-5, 5)) plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.show()
copy
Note
Note

The boundary produced by One-Class SVM is highly flexible due to the RBF kernel. It can adapt to different data shapes, wrapping tightly around clusters or stretching to enclose elongated distributions. This adaptability means the SVM is sensitive to the true structure of your data, but also to parameter choices like gamma and nu that control how closely the boundary follows the data.

question mark

Which of the following statements best describes the decision boundary created by a One-Class SVM with an RBF kernel?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 5. Kapitel 2

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Awesome!

Completion rate improved to 4.55

bookIntuition: Separating Normal Data in Feature Space

Svep för att visa menyn

To understand how One-Class SVM separates normal from novel data, imagine mapping your input data into a high-dimensional feature space using a kernel function, such as the radial basis function (RBF). In this transformed space, the algorithm constructs a flexible boundary that tightly encloses the majority of normal data points, treating anything outside as a potential anomaly. This boundary is not a simple geometric shape like a circle or rectangle, but adapts to the underlying distribution and spread of your data. The SVM learns this boundary by maximizing the margin between the bulk of the data and the origin, effectively distinguishing between what is considered "normal" and what lies outside as "novel".

1234567891011121314151617181920212223242526272829303132333435363738
import numpy as np import matplotlib.pyplot as plt from sklearn.svm import OneClassSVM # Generate normal data rng = np.random.RandomState(42) X = 0.3 * rng.randn(100, 2) X_train = np.r_[X + 2, X - 2] # Generate some novel (outlier) points X_outliers = rng.uniform(low=-4, high=4, size=(20, 2)) # Fit the One-Class SVM clf = OneClassSVM(kernel="rbf", gamma=0.5, nu=0.1) clf.fit(X_train) # Create a grid for plotting xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500)) Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.figure(figsize=(8, 8)) # Plot decision boundary and margin plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu) a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred') # Plot training points plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=20, edgecolor='k', label="Normal") # Plot outliers plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=20, edgecolor='k', label="Novel") # Highlight support vectors plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100, facecolors='none', edgecolors='navy', label="Support Vectors") plt.legend() plt.title("One-Class SVM with RBF Kernel: Decision Boundary and Support Vectors") plt.xlim((-5, 5)) plt.ylim((-5, 5)) plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.show()
copy
Note
Note

The boundary produced by One-Class SVM is highly flexible due to the RBF kernel. It can adapt to different data shapes, wrapping tightly around clusters or stretching to enclose elongated distributions. This adaptability means the SVM is sensitive to the true structure of your data, but also to parameter choices like gamma and nu that control how closely the boundary follows the data.

question mark

Which of the following statements best describes the decision boundary created by a One-Class SVM with an RBF kernel?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 5. Kapitel 2
some-alt