Variance, Covariance, and the Covariance Matrix
Variance measures how much a variable deviates from its mean.
The formula for variance of a variable x is:
Var(x)=n1βi=1βnβ(xiββxΛ)2Covariance measures how two variables change together.
The formula for Covariance of variables x and y is:
Cov(x,y)=nβ11βi=1βnβ(xiββxΛ)(yiββyΛβ)The covariance matrix generalizes covariance to multiple variables. For a dataset X with d features and n samples, the covariance matrix Ξ£ is a dΓd matrix where each entry Ξ£ijβ is the covariance between feature i and feature j, computed with denominator nβ1 to be an unbiased estimator.
12345678910111213import numpy as np # Example data: 3 samples, 2 features X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]]) # Center the data (subtract mean) X_centered = X - np.mean(X, axis=0) # Compute covariance matrix manually cov_matrix = (X_centered.T @ X_centered) / X_centered.shape[0] print("Covariance matrix:\n", cov_matrix)
In the code above, you manually center the data and compute the covariance matrix using matrix multiplication. This matrix captures how each pair of features varies together.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain why we center the data before computing the covariance matrix?
What is the difference between dividing by n and n-1 in the covariance calculation?
How do I interpret the values in the covariance matrix?
Awesome!
Completion rate improved to 8.33
Variance, Covariance, and the Covariance Matrix
Swipe to show menu
Variance measures how much a variable deviates from its mean.
The formula for variance of a variable x is:
Var(x)=n1βi=1βnβ(xiββxΛ)2Covariance measures how two variables change together.
The formula for Covariance of variables x and y is:
Cov(x,y)=nβ11βi=1βnβ(xiββxΛ)(yiββyΛβ)The covariance matrix generalizes covariance to multiple variables. For a dataset X with d features and n samples, the covariance matrix Ξ£ is a dΓd matrix where each entry Ξ£ijβ is the covariance between feature i and feature j, computed with denominator nβ1 to be an unbiased estimator.
12345678910111213import numpy as np # Example data: 3 samples, 2 features X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]]) # Center the data (subtract mean) X_centered = X - np.mean(X, axis=0) # Compute covariance matrix manually cov_matrix = (X_centered.T @ X_centered) / X_centered.shape[0] print("Covariance matrix:\n", cov_matrix)
In the code above, you manually center the data and compute the covariance matrix using matrix multiplication. This matrix captures how each pair of features varies together.
Thanks for your feedback!