Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Intuition for Covariance-Based Detection | Statistical and Distance-Based Methods
Outlier and Novelty Detection in Practice

bookIntuition for Covariance-Based Detection

Understanding how covariance matrices shape the detection of outliers is crucial for interpreting many statistical anomaly detection methods. In two-dimensional data, the covariance matrix not only determines the spread of the data but also the orientation of the regions considered "normal." You can think of the covariance matrix as defining an ellipse around the mean of your data: the size and tilt of this ellipse reflect both the variances of each feature and how those features move together. When the covariance between two features is high, the ellipse stretches diagonally, showing that changes in one feature are associated with changes in the other. If the covariance is zero, the ellipse aligns with the axes, and each feature varies independently. Outliers are then identified as points that fall far outside this ellipse, indicating they do not follow the same pattern as most of the data.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as np import matplotlib.pyplot as plt def plot_cov_ellipse(cov, mean, ax, n_std=2.0, **kwargs): from matplotlib.patches import Ellipse import matplotlib.transforms as transforms pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1]) ell_radius_x = np.sqrt(1 + pearson) ell_radius_y = np.sqrt(1 - pearson) ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor='none', **kwargs) scale_x = np.sqrt(cov[0, 0]) * n_std scale_y = np.sqrt(cov[1, 1]) * n_std transf = transforms.Affine2D() \ .rotate_deg(45 if pearson != 0 else 0) \ .scale(scale_x, scale_y) \ .translate(mean[0], mean[1]) ellipse.set_transform(transf + ax.transData) return ax.add_patch(ellipse) np.random.seed(0) mean = [0, 0] covariances = [ np.array([[3, 0], [0, 1]]), # Axis-aligned, more spread in x np.array([[1, 0.8], [0.8, 1]]), # Tilted, strong positive correlation np.array([[1, -0.8], [-0.8, 1]]) # Tilted, strong negative correlation ] fig, axs = plt.subplots(1, 3, figsize=(15, 5)) titles = ["Axis-aligned", "Positive correlation", "Negative correlation"] for ax, cov, title in zip(axs, covariances, titles): data = np.random.multivariate_normal(mean, cov, 500) ax.scatter(data[:, 0], data[:, 1], alpha=0.3) plot_cov_ellipse(cov, mean, ax, n_std=2, edgecolor='red') ax.set_title(title) ax.set_xlim(-6, 6) ax.set_ylim(-6, 6) ax.set_aspect('equal') plt.tight_layout() plt.show()
copy
Note
Note

Outlier detection based on covariance involves measuring how far a data point is from the center, accounting for the direction and spread defined by the covariance matrix. Points that lie outside the ellipse are considered outliers because they are farther from the mean than expected, given the variance and correlation structure of the data. The more elongated or tilted the ellipse, the more the algorithm "expects" data to vary in that direction, making it less likely to flag points in the long direction as outliers.

question mark

Which statement best describes the effect of covariance on outlier detection in two-dimensional data?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 3

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Awesome!

Completion rate improved to 4.55

bookIntuition for Covariance-Based Detection

Свайпніть щоб показати меню

Understanding how covariance matrices shape the detection of outliers is crucial for interpreting many statistical anomaly detection methods. In two-dimensional data, the covariance matrix not only determines the spread of the data but also the orientation of the regions considered "normal." You can think of the covariance matrix as defining an ellipse around the mean of your data: the size and tilt of this ellipse reflect both the variances of each feature and how those features move together. When the covariance between two features is high, the ellipse stretches diagonally, showing that changes in one feature are associated with changes in the other. If the covariance is zero, the ellipse aligns with the axes, and each feature varies independently. Outliers are then identified as points that fall far outside this ellipse, indicating they do not follow the same pattern as most of the data.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as np import matplotlib.pyplot as plt def plot_cov_ellipse(cov, mean, ax, n_std=2.0, **kwargs): from matplotlib.patches import Ellipse import matplotlib.transforms as transforms pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1]) ell_radius_x = np.sqrt(1 + pearson) ell_radius_y = np.sqrt(1 - pearson) ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor='none', **kwargs) scale_x = np.sqrt(cov[0, 0]) * n_std scale_y = np.sqrt(cov[1, 1]) * n_std transf = transforms.Affine2D() \ .rotate_deg(45 if pearson != 0 else 0) \ .scale(scale_x, scale_y) \ .translate(mean[0], mean[1]) ellipse.set_transform(transf + ax.transData) return ax.add_patch(ellipse) np.random.seed(0) mean = [0, 0] covariances = [ np.array([[3, 0], [0, 1]]), # Axis-aligned, more spread in x np.array([[1, 0.8], [0.8, 1]]), # Tilted, strong positive correlation np.array([[1, -0.8], [-0.8, 1]]) # Tilted, strong negative correlation ] fig, axs = plt.subplots(1, 3, figsize=(15, 5)) titles = ["Axis-aligned", "Positive correlation", "Negative correlation"] for ax, cov, title in zip(axs, covariances, titles): data = np.random.multivariate_normal(mean, cov, 500) ax.scatter(data[:, 0], data[:, 1], alpha=0.3) plot_cov_ellipse(cov, mean, ax, n_std=2, edgecolor='red') ax.set_title(title) ax.set_xlim(-6, 6) ax.set_ylim(-6, 6) ax.set_aspect('equal') plt.tight_layout() plt.show()
copy
Note
Note

Outlier detection based on covariance involves measuring how far a data point is from the center, accounting for the direction and spread defined by the covariance matrix. Points that lie outside the ellipse are considered outliers because they are farther from the mean than expected, given the variance and correlation structure of the data. The more elongated or tilted the ellipse, the more the algorithm "expects" data to vary in that direction, making it less likely to flag points in the long direction as outliers.

question mark

Which statement best describes the effect of covariance on outlier detection in two-dimensional data?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 3
some-alt