Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Intuition for Covariance-Based Detection | Statistical and Distance-Based Methods
Outlier and Novelty Detection in Practice

bookIntuition for Covariance-Based Detection

Understanding how covariance matrices shape the detection of outliers is crucial for interpreting many statistical anomaly detection methods. In two-dimensional data, the covariance matrix not only determines the spread of the data but also the orientation of the regions considered "normal." You can think of the covariance matrix as defining an ellipse around the mean of your data: the size and tilt of this ellipse reflect both the variances of each feature and how those features move together. When the covariance between two features is high, the ellipse stretches diagonally, showing that changes in one feature are associated with changes in the other. If the covariance is zero, the ellipse aligns with the axes, and each feature varies independently. Outliers are then identified as points that fall far outside this ellipse, indicating they do not follow the same pattern as most of the data.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as np import matplotlib.pyplot as plt def plot_cov_ellipse(cov, mean, ax, n_std=2.0, **kwargs): from matplotlib.patches import Ellipse import matplotlib.transforms as transforms pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1]) ell_radius_x = np.sqrt(1 + pearson) ell_radius_y = np.sqrt(1 - pearson) ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor='none', **kwargs) scale_x = np.sqrt(cov[0, 0]) * n_std scale_y = np.sqrt(cov[1, 1]) * n_std transf = transforms.Affine2D() \ .rotate_deg(45 if pearson != 0 else 0) \ .scale(scale_x, scale_y) \ .translate(mean[0], mean[1]) ellipse.set_transform(transf + ax.transData) return ax.add_patch(ellipse) np.random.seed(0) mean = [0, 0] covariances = [ np.array([[3, 0], [0, 1]]), # Axis-aligned, more spread in x np.array([[1, 0.8], [0.8, 1]]), # Tilted, strong positive correlation np.array([[1, -0.8], [-0.8, 1]]) # Tilted, strong negative correlation ] fig, axs = plt.subplots(1, 3, figsize=(15, 5)) titles = ["Axis-aligned", "Positive correlation", "Negative correlation"] for ax, cov, title in zip(axs, covariances, titles): data = np.random.multivariate_normal(mean, cov, 500) ax.scatter(data[:, 0], data[:, 1], alpha=0.3) plot_cov_ellipse(cov, mean, ax, n_std=2, edgecolor='red') ax.set_title(title) ax.set_xlim(-6, 6) ax.set_ylim(-6, 6) ax.set_aspect('equal') plt.tight_layout() plt.show()
copy
Note
Note

Outlier detection based on covariance involves measuring how far a data point is from the center, accounting for the direction and spread defined by the covariance matrix. Points that lie outside the ellipse are considered outliers because they are farther from the mean than expected, given the variance and correlation structure of the data. The more elongated or tilted the ellipse, the more the algorithm "expects" data to vary in that direction, making it less likely to flag points in the long direction as outliers.

question mark

Which statement best describes the effect of covariance on outlier detection in two-dimensional data?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 3

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Awesome!

Completion rate improved to 4.55

bookIntuition for Covariance-Based Detection

Sveip for å vise menyen

Understanding how covariance matrices shape the detection of outliers is crucial for interpreting many statistical anomaly detection methods. In two-dimensional data, the covariance matrix not only determines the spread of the data but also the orientation of the regions considered "normal." You can think of the covariance matrix as defining an ellipse around the mean of your data: the size and tilt of this ellipse reflect both the variances of each feature and how those features move together. When the covariance between two features is high, the ellipse stretches diagonally, showing that changes in one feature are associated with changes in the other. If the covariance is zero, the ellipse aligns with the axes, and each feature varies independently. Outliers are then identified as points that fall far outside this ellipse, indicating they do not follow the same pattern as most of the data.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as np import matplotlib.pyplot as plt def plot_cov_ellipse(cov, mean, ax, n_std=2.0, **kwargs): from matplotlib.patches import Ellipse import matplotlib.transforms as transforms pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1]) ell_radius_x = np.sqrt(1 + pearson) ell_radius_y = np.sqrt(1 - pearson) ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor='none', **kwargs) scale_x = np.sqrt(cov[0, 0]) * n_std scale_y = np.sqrt(cov[1, 1]) * n_std transf = transforms.Affine2D() \ .rotate_deg(45 if pearson != 0 else 0) \ .scale(scale_x, scale_y) \ .translate(mean[0], mean[1]) ellipse.set_transform(transf + ax.transData) return ax.add_patch(ellipse) np.random.seed(0) mean = [0, 0] covariances = [ np.array([[3, 0], [0, 1]]), # Axis-aligned, more spread in x np.array([[1, 0.8], [0.8, 1]]), # Tilted, strong positive correlation np.array([[1, -0.8], [-0.8, 1]]) # Tilted, strong negative correlation ] fig, axs = plt.subplots(1, 3, figsize=(15, 5)) titles = ["Axis-aligned", "Positive correlation", "Negative correlation"] for ax, cov, title in zip(axs, covariances, titles): data = np.random.multivariate_normal(mean, cov, 500) ax.scatter(data[:, 0], data[:, 1], alpha=0.3) plot_cov_ellipse(cov, mean, ax, n_std=2, edgecolor='red') ax.set_title(title) ax.set_xlim(-6, 6) ax.set_ylim(-6, 6) ax.set_aspect('equal') plt.tight_layout() plt.show()
copy
Note
Note

Outlier detection based on covariance involves measuring how far a data point is from the center, accounting for the direction and spread defined by the covariance matrix. Points that lie outside the ellipse are considered outliers because they are farther from the mean than expected, given the variance and correlation structure of the data. The more elongated or tilted the ellipse, the more the algorithm "expects" data to vary in that direction, making it less likely to flag points in the long direction as outliers.

question mark

Which statement best describes the effect of covariance on outlier detection in two-dimensional data?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 3
some-alt