Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Intuition for Covariance-Based Detection | Statistical and Distance-Based Methods
Outlier and Novelty Detection in Practice

bookIntuition for Covariance-Based Detection

Understanding how covariance matrices shape the detection of outliers is crucial for interpreting many statistical anomaly detection methods. In two-dimensional data, the covariance matrix not only determines the spread of the data but also the orientation of the regions considered "normal." You can think of the covariance matrix as defining an ellipse around the mean of your data: the size and tilt of this ellipse reflect both the variances of each feature and how those features move together. When the covariance between two features is high, the ellipse stretches diagonally, showing that changes in one feature are associated with changes in the other. If the covariance is zero, the ellipse aligns with the axes, and each feature varies independently. Outliers are then identified as points that fall far outside this ellipse, indicating they do not follow the same pattern as most of the data.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as np import matplotlib.pyplot as plt def plot_cov_ellipse(cov, mean, ax, n_std=2.0, **kwargs): from matplotlib.patches import Ellipse import matplotlib.transforms as transforms pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1]) ell_radius_x = np.sqrt(1 + pearson) ell_radius_y = np.sqrt(1 - pearson) ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor='none', **kwargs) scale_x = np.sqrt(cov[0, 0]) * n_std scale_y = np.sqrt(cov[1, 1]) * n_std transf = transforms.Affine2D() \ .rotate_deg(45 if pearson != 0 else 0) \ .scale(scale_x, scale_y) \ .translate(mean[0], mean[1]) ellipse.set_transform(transf + ax.transData) return ax.add_patch(ellipse) np.random.seed(0) mean = [0, 0] covariances = [ np.array([[3, 0], [0, 1]]), # Axis-aligned, more spread in x np.array([[1, 0.8], [0.8, 1]]), # Tilted, strong positive correlation np.array([[1, -0.8], [-0.8, 1]]) # Tilted, strong negative correlation ] fig, axs = plt.subplots(1, 3, figsize=(15, 5)) titles = ["Axis-aligned", "Positive correlation", "Negative correlation"] for ax, cov, title in zip(axs, covariances, titles): data = np.random.multivariate_normal(mean, cov, 500) ax.scatter(data[:, 0], data[:, 1], alpha=0.3) plot_cov_ellipse(cov, mean, ax, n_std=2, edgecolor='red') ax.set_title(title) ax.set_xlim(-6, 6) ax.set_ylim(-6, 6) ax.set_aspect('equal') plt.tight_layout() plt.show()
copy
Note
Note

Outlier detection based on covariance involves measuring how far a data point is from the center, accounting for the direction and spread defined by the covariance matrix. Points that lie outside the ellipse are considered outliers because they are farther from the mean than expected, given the variance and correlation structure of the data. The more elongated or tilted the ellipse, the more the algorithm "expects" data to vary in that direction, making it less likely to flag points in the long direction as outliers.

question mark

Which statement best describes the effect of covariance on outlier detection in two-dimensional data?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 3

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain how the orientation of the ellipse relates to the covariance values?

How does this visualization help in detecting outliers?

Can you describe what would happen if the covariance matrix had negative values on the diagonal?

Awesome!

Completion rate improved to 4.55

bookIntuition for Covariance-Based Detection

Pyyhkäise näyttääksesi valikon

Understanding how covariance matrices shape the detection of outliers is crucial for interpreting many statistical anomaly detection methods. In two-dimensional data, the covariance matrix not only determines the spread of the data but also the orientation of the regions considered "normal." You can think of the covariance matrix as defining an ellipse around the mean of your data: the size and tilt of this ellipse reflect both the variances of each feature and how those features move together. When the covariance between two features is high, the ellipse stretches diagonally, showing that changes in one feature are associated with changes in the other. If the covariance is zero, the ellipse aligns with the axes, and each feature varies independently. Outliers are then identified as points that fall far outside this ellipse, indicating they do not follow the same pattern as most of the data.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as np import matplotlib.pyplot as plt def plot_cov_ellipse(cov, mean, ax, n_std=2.0, **kwargs): from matplotlib.patches import Ellipse import matplotlib.transforms as transforms pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1]) ell_radius_x = np.sqrt(1 + pearson) ell_radius_y = np.sqrt(1 - pearson) ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor='none', **kwargs) scale_x = np.sqrt(cov[0, 0]) * n_std scale_y = np.sqrt(cov[1, 1]) * n_std transf = transforms.Affine2D() \ .rotate_deg(45 if pearson != 0 else 0) \ .scale(scale_x, scale_y) \ .translate(mean[0], mean[1]) ellipse.set_transform(transf + ax.transData) return ax.add_patch(ellipse) np.random.seed(0) mean = [0, 0] covariances = [ np.array([[3, 0], [0, 1]]), # Axis-aligned, more spread in x np.array([[1, 0.8], [0.8, 1]]), # Tilted, strong positive correlation np.array([[1, -0.8], [-0.8, 1]]) # Tilted, strong negative correlation ] fig, axs = plt.subplots(1, 3, figsize=(15, 5)) titles = ["Axis-aligned", "Positive correlation", "Negative correlation"] for ax, cov, title in zip(axs, covariances, titles): data = np.random.multivariate_normal(mean, cov, 500) ax.scatter(data[:, 0], data[:, 1], alpha=0.3) plot_cov_ellipse(cov, mean, ax, n_std=2, edgecolor='red') ax.set_title(title) ax.set_xlim(-6, 6) ax.set_ylim(-6, 6) ax.set_aspect('equal') plt.tight_layout() plt.show()
copy
Note
Note

Outlier detection based on covariance involves measuring how far a data point is from the center, accounting for the direction and spread defined by the covariance matrix. Points that lie outside the ellipse are considered outliers because they are farther from the mean than expected, given the variance and correlation structure of the data. The more elongated or tilted the ellipse, the more the algorithm "expects" data to vary in that direction, making it less likely to flag points in the long direction as outliers.

question mark

Which statement best describes the effect of covariance on outlier detection in two-dimensional data?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 3
some-alt