Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Variance, Covariance, and the Covariance Matrix | Mathematical Foundations of PCA
Dimensionality Reduction with PCA

bookVariance, Covariance, and the Covariance Matrix

Note
Definition

Variance measures how much a variable deviates from its mean.

The formula for variance of a variable xx is:

Var(x)=1nβˆ‘i=1n(xiβˆ’xΛ‰)2\mathrm{Var}(x) = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2
Note
Definition

Covariance measures how two variables change together.

The formula for Covariance of variables xx and yy is:

Cov(x,y)=1nβˆ’1βˆ‘i=1n(xiβˆ’xΛ‰)(yiβˆ’yΛ‰)\mathrm{Cov}(x, y) = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})

The covariance matrix generalizes covariance to multiple variables. For a dataset XX with dd features and nn samples, the covariance matrix Ξ£\Sigma is a dΓ—dd \times d matrix where each entry Ξ£ij\Sigma_{ij} is the covariance between feature ii and feature jj, computed with denominator nβˆ’1n-1 to be an unbiased estimator.

12345678910111213
import numpy as np # Example data: 3 samples, 2 features X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]]) # Center the data (subtract mean) X_centered = X - np.mean(X, axis=0) # Compute covariance matrix manually cov_matrix = (X_centered.T @ X_centered) / X_centered.shape[0] print("Covariance matrix:\n", cov_matrix)
copy

In the code above, you manually center the data and compute the covariance matrix using matrix multiplication. This matrix captures how each pair of features varies together.

question mark

Which statement accurately describes the relationship between variance, covariance, and the covariance matrix

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain why we center the data before computing the covariance matrix?

What is the difference between dividing by n and n-1 in the covariance calculation?

How do I interpret the values in the covariance matrix?

Awesome!

Completion rate improved to 8.33

bookVariance, Covariance, and the Covariance Matrix

Swipe to show menu

Note
Definition

Variance measures how much a variable deviates from its mean.

The formula for variance of a variable xx is:

Var(x)=1nβˆ‘i=1n(xiβˆ’xΛ‰)2\mathrm{Var}(x) = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2
Note
Definition

Covariance measures how two variables change together.

The formula for Covariance of variables xx and yy is:

Cov(x,y)=1nβˆ’1βˆ‘i=1n(xiβˆ’xΛ‰)(yiβˆ’yΛ‰)\mathrm{Cov}(x, y) = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})

The covariance matrix generalizes covariance to multiple variables. For a dataset XX with dd features and nn samples, the covariance matrix Ξ£\Sigma is a dΓ—dd \times d matrix where each entry Ξ£ij\Sigma_{ij} is the covariance between feature ii and feature jj, computed with denominator nβˆ’1n-1 to be an unbiased estimator.

12345678910111213
import numpy as np # Example data: 3 samples, 2 features X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]]) # Center the data (subtract mean) X_centered = X - np.mean(X, axis=0) # Compute covariance matrix manually cov_matrix = (X_centered.T @ X_centered) / X_centered.shape[0] print("Covariance matrix:\n", cov_matrix)
copy

In the code above, you manually center the data and compute the covariance matrix using matrix multiplication. This matrix captures how each pair of features varies together.

question mark

Which statement accurately describes the relationship between variance, covariance, and the covariance matrix

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 1
some-alt