Reducing Dimensions by Maximizing Variance
メニューを表示するにはスワイプしてください
PCA ranks principal components by the variance they capture, measured by their eigenvalues. Keeping the top k components preserves the most variance, as each component captures less than the previous one and is orthogonal to earlier components. This reduces dimensions while retaining the most informative directions in your data.
The explained variance ratio for each principal component is:
Explained Variance Ratio=∑jλjλiwhere λi is the i-th largest eigenvalue. This ratio shows how much of the total variance in your data is captured by each principal component. The sum of all explained variance ratios is always 1, since all eigenvalues together account for the total variance in the dataset.
123456789101112import numpy as np # Using eigenvalues from previous code X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]]) X_centered = X - np.mean(X, axis=0) cov_matrix = (X_centered.T @ X_centered) / X_centered.shape[0] values, vectors = np.linalg.eig(cov_matrix) explained_variance_ratio = values / np.sum(values) print("Explained variance ratio:", explained_variance_ratio)
Selecting the top principal components so that their explained variance ratios add up to a specific threshold - such as 95% - lets you reduce the number of dimensions while keeping most of the data's information. This means you only keep the directions in your data where the spread is greatest, which are the most informative for analysis or modeling. By focusing on these components, you simplify your dataset without losing the patterns that matter most. This balance between dimensionality and information is a key advantage of PCA.
フィードバックありがとうございます!
AIに質問する
AIに質問する
何でも質問するか、提案された質問の1つを試してチャットを始めてください