Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Reducing Dimensions by Maximizing Variance | Mathematical Foundations of PCA
Dimensionality Reduction with PCA

bookReducing Dimensions by Maximizing Variance

PCA ranks principal components by the variance they capture, measured by their eigenvalues. Keeping the top k components preserves the most variance, as each component captures less than the previous one and is orthogonal to earlier components. This reduces dimensions while retaining the most informative directions in your data.

The explained variance ratio for each principal component is:

ExplainedΒ VarianceΒ Ratio=Ξ»iβˆ‘jΞ»j\text{Explained Variance Ratio} = \frac{\lambda_i}{\sum_j \lambda_j}

where Ξ»iΞ»_i is the ii-th largest eigenvalue. This ratio shows how much of the total variance in your data is captured by each principal component. The sum of all explained variance ratios is always 1, since all eigenvalues together account for the total variance in the dataset.

123456789101112
import numpy as np # Using eigenvalues from previous code X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]]) X_centered = X - np.mean(X, axis=0) cov_matrix = (X_centered.T @ X_centered) / X_centered.shape[0] values, vectors = np.linalg.eig(cov_matrix) explained_variance_ratio = values / np.sum(values) print("Explained variance ratio:", explained_variance_ratio)
copy

Selecting the top principal components so that their explained variance ratios add up to a specific threshold - such as 95% - lets you reduce the number of dimensions while keeping most of the data's information. This means you only keep the directions in your data where the spread is greatest, which are the most informative for analysis or modeling. By focusing on these components, you simplify your dataset without losing the patterns that matter most. This balance between dimensionality and information is a key advantage of PCA.

question mark

What does the explained variance ratio represent in principal component analysis (PCA)?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how to choose the optimal number of principal components?

What happens if I keep too few or too many principal components?

Can you show how to calculate the cumulative explained variance?

Awesome!

Completion rate improved to 8.33

bookReducing Dimensions by Maximizing Variance

Swipe to show menu

PCA ranks principal components by the variance they capture, measured by their eigenvalues. Keeping the top k components preserves the most variance, as each component captures less than the previous one and is orthogonal to earlier components. This reduces dimensions while retaining the most informative directions in your data.

The explained variance ratio for each principal component is:

ExplainedΒ VarianceΒ Ratio=Ξ»iβˆ‘jΞ»j\text{Explained Variance Ratio} = \frac{\lambda_i}{\sum_j \lambda_j}

where Ξ»iΞ»_i is the ii-th largest eigenvalue. This ratio shows how much of the total variance in your data is captured by each principal component. The sum of all explained variance ratios is always 1, since all eigenvalues together account for the total variance in the dataset.

123456789101112
import numpy as np # Using eigenvalues from previous code X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]]) X_centered = X - np.mean(X, axis=0) cov_matrix = (X_centered.T @ X_centered) / X_centered.shape[0] values, vectors = np.linalg.eig(cov_matrix) explained_variance_ratio = values / np.sum(values) print("Explained variance ratio:", explained_variance_ratio)
copy

Selecting the top principal components so that their explained variance ratios add up to a specific threshold - such as 95% - lets you reduce the number of dimensions while keeping most of the data's information. This means you only keep the directions in your data where the spread is greatest, which are the most informative for analysis or modeling. By focusing on these components, you simplify your dataset without losing the patterns that matter most. This balance between dimensionality and information is a key advantage of PCA.

question mark

What does the explained variance ratio represent in principal component analysis (PCA)?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 4
some-alt