Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Comparing Outlier Detection Algorithms | Section
Outlier and Novelty Detection

bookComparing Outlier Detection Algorithms

Deslize para mostrar o menu

When you face real-world data, choosing the right outlier detection algorithm is crucial. Each method – Isolation Forest, One-Class SVM, Local Outlier Factor (LOF), and Robust Covariance – embodies different logic, assumptions, and strengths.

Note
Note

When choosing an outlier detection algorithm, interpretability, scalability, and robustness often trade off against each other. Isolation Forest is highly scalable and robust to high-dimensional data but less interpretable, as the logic behind each individual outlier score is not transparent. One-Class SVM offers flexibility through kernels but can be computationally demanding and less interpretable for complex kernels. LOF excels in finding local anomalies but may struggle with very large datasets due to its reliance on distance calculations. Robust Covariance is interpretable and effective for data following a Gaussian distribution but is sensitive to high dimensionality and non-Gaussian data.

12345678910111213141516171819202122232425262728293031323334353637383940414243
import numpy as np from sklearn.ensemble import IsolationForest from sklearn.svm import OneClassSVM from sklearn.neighbors import LocalOutlierFactor from sklearn.covariance import EllipticEnvelope from sklearn.datasets import make_blobs import matplotlib.pyplot as plt # Generate synthetic data with outliers X, _ = make_blobs(n_samples=300, centers=1, cluster_std=1.0, random_state=42) rng = np.random.RandomState(42) outliers = rng.uniform(low=-6, high=6, size=(20, 2)) X = np.vstack([X, outliers]) # Fit models iso = IsolationForest(contamination=0.06, random_state=42) svm = OneClassSVM(nu=0.06, kernel="rbf", gamma=0.1) lof = LocalOutlierFactor(n_neighbors=20, contamination=0.06) cov = EllipticEnvelope(contamination=0.06, random_state=42) y_pred_iso = iso.fit_predict(X) y_pred_svm = svm.fit(X).predict(X) y_pred_lof = lof.fit_predict(X) y_pred_cov = cov.fit(X).predict(X) algorithms = [ ("Isolation Forest", y_pred_iso), ("One-Class SVM", y_pred_svm), ("LOF", y_pred_lof), ("Robust Covariance", y_pred_cov) ] plt.figure(figsize=(12, 8)) for i, (name, y_pred) in enumerate(algorithms, 1): plt.subplot(2, 2, i) plt.scatter(X[:, 0], X[:, 1], c=(y_pred == -1), cmap="coolwarm", edgecolor="k", s=30) plt.title(name) plt.xticks([]) plt.yticks([]) plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.tight_layout() plt.show()
copy
question mark

Which of the following statements accurately describe the trade-offs between Isolation Forest, One-Class SVM, Local Outlier Factor (LOF), and Robust Covariance?

Selecione todas as respostas corretas

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 20

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Seção 1. Capítulo 20
some-alt