Summary  
This chapter demonstrates how to implement and compare multiple outlier detection algorithms in code, highlighting each method’s core logic, assumptions, and performance characteristics.  

General domain of usage  
Anomaly detection in data analysis

When you face real-world data, choosing the right outlier detection algorithm is crucial. Each method – Isolation Forest, One-Class SVM, Local Outlier Factor (LOF), and Robust Covariance – embodies different logic, assumptions, and strengths.

When choosing an outlier detection algorithm, **interpretability**, **scalability**, and **robustness** often trade off against each other. **Isolation Forest** is highly scalable and robust to high-dimensional data but less interpretable, as the logic behind each individual outlier score is not transparent. **One-Class SVM** offers flexibility through kernels but can be computationally demanding and less interpretable for complex kernels. **LOF** excels in finding local anomalies but may struggle with very large datasets due to its reliance on distance calculations. **Robust Covariance** is interpretable and effective for data following a Gaussian distribution but is sensitive to high dimensionality and non-Gaussian data.

Note

import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.neighbors import LocalOutlierFactor
from sklearn.covariance import EllipticEnvelope
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Generate synthetic data with outliers
X, _ = make_blobs(n_samples=300, centers=1, cluster_std=1.0, random_state=42)
rng = np.random.RandomState(42)
outliers = rng.uniform(low=-6, high=6, size=(20, 2))
X = np.vstack([X, outliers])

# Fit models
iso = IsolationForest(contamination=0.06, random_state=42)
svm = OneClassSVM(nu=0.06, kernel="rbf", gamma=0.1)
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.06)
cov = EllipticEnvelope(contamination=0.06, random_state=42)

y_pred_iso = iso.fit_predict(X)
y_pred_svm = svm.fit(X).predict(X)
y_pred_lof = lof.fit_predict(X)
y_pred_cov = cov.fit(X).predict(X)

algorithms = [
    ("Isolation Forest", y_pred_iso),
    ("One-Class SVM", y_pred_svm),
    ("LOF", y_pred_lof),
    ("Robust Covariance", y_pred_cov)
]

plt.figure(figsize=(12, 8))
for i, (name, y_pred) in enumerate(algorithms, 1):
    plt.subplot(2, 2, i)
    plt.scatter(X[:, 0], X[:, 1], c=(y_pred == -1), cmap="coolwarm", edgecolor="k", s=30)
    plt.title(name)
    plt.xticks([])
    plt.yticks([])
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
plt.tight_layout()
plt.show()

Which of the following statements accurately describe the trade-offs between Isolation Forest, One-Class SVM, Local Outlier Factor (LOF), and Robust Covariance?

A comprehensive, hands-on course exploring the theory, intuition, and practical implementation of outlier and novelty detection algorithms in Python. Learn to identify anomalies using statistical, isolation, density, and kernel-based methods, interpret results, and compare approaches for real-world applications.

Comparing Outlier Detection Algorithms