Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Eigenvalues and Eigenvectors | Basic Concepts of PCA
Principal Component Analysis

book
Eigenvalues and Eigenvectors

Let's move on to more complex concepts: eigenvalues and eigenvectors. At this step, it is required to calculate the eigenvalues and eigenvectors from the covariance matrix to obtain the principal components.

The first step is to calculate the eigenvalues ​​of the covariance matrix. Already on the basis of the eigenvalues, the eigenvectors are calculated.

The resulting values ​​are eigenvectors (i.e. principal components) that solve the mathematical problem of finding the direction of the axes that maximizes the variance between data points along that direction. To make it easier to understand, just imagine that the resulting principal components are a new, more convenient way of presenting the data, a new angle from which differences in the data become more visible to us.

At the output, we will get the same number of components as we originally had and there were variables in the dataset. For example, a dataset with 20 variables will receive 20 principal components at this stage.

The main detail is that each eigenvector has its own pair of eigenvalues. The larger the eigenvalue, the higher the significance of the resulting main component (eigenvector). The first component stores the most important information, the second a little less, and so on.

Why eigenvectors play such an important role in the formation of the principal components is a difficult question, the answer to which lies in a long mathematical proof. For now, we just need to know that it works.

Let's use numpy to calculate eigenvalues and eigenvectors:

python
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)
Opgave

Swipe to start coding

Sort the resulting principal components (eigenvectors) in descending order of their value using the ind list (indices of sorted results) and print output.

Løsning

# Importing libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Reading and standardizing the data
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/7b22c447-77ad-48ae-a2d2-4e6714f7a4a6/train.csv')
X_scaled = StandardScaler().fit_transform(df)
# Calculating covariance matrix, eigenvalues, and eigenvectors
cov_mat = np.cov(X_scaled, rowvar = False)
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)

# Sort obtained results in descending order
ind = np.arange(0, len(eigen_values), 1)
ind = ([x for _, x in sorted(zip(eigen_values, ind))])[::-1]
filter_eigen_values = eigen_values[ind]
filter_eigen_vectors = eigen_vectors[:, ind]

# Displaying the results
print("Sorted eigenvectors ", filter_eigen_vectors)
print("Sorted eigenvalues ", filter_eigen_values)

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 3
# Importing libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Reading and standardizing the data
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/7b22c447-77ad-48ae-a2d2-4e6714f7a4a6/train.csv')
X_scaled = StandardScaler().fit_transform(df)
# Calculating covariance matrix, eigenvalues, and eigenvectors
cov_mat = np.cov(X_scaled, rowvar = False)
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)

# Sort obtained results in descending order
ind = np.arange(0, len(eigen_values), 1)
ind = ([x for _, x in sorted(zip(eigen_values, ind))])[::-1]
filter_eigen_values = ___[___]
filter_eigen_vectors = ___[___]

# Displaying the results
print("Sorted eigenvectors ", filter_eigen_vectors)
print("Sorted eigenvalues ", filter_eigen_values)

Spørg AI

expand
ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

some-alt