Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Eigenvalues and Eigenvectors | Basic Concepts of PCA
Principal Component Analysis

book
Eigenvalues and Eigenvectors

Let's move on to more complex concepts: eigenvalues and eigenvectors. At this step, it is required to calculate the eigenvalues and eigenvectors from the covariance matrix to obtain the principal components.

The first step is to calculate the eigenvalues ​​of the covariance matrix. Already on the basis of the eigenvalues, the eigenvectors are calculated.

The resulting values ​​are eigenvectors (i.e. principal components) that solve the mathematical problem of finding the direction of the axes that maximizes the variance between data points along that direction. To make it easier to understand, just imagine that the resulting principal components are a new, more convenient way of presenting the data, a new angle from which differences in the data become more visible to us.

At the output, we will get the same number of components as we originally had and there were variables in the dataset. For example, a dataset with 20 variables will receive 20 principal components at this stage.

The main detail is that each eigenvector has its own pair of eigenvalues. The larger the eigenvalue, the higher the significance of the resulting main component (eigenvector). The first component stores the most important information, the second a little less, and so on.

Why eigenvectors play such an important role in the formation of the principal components is a difficult question, the answer to which lies in a long mathematical proof. For now, we just need to know that it works.

Let's use numpy to calculate eigenvalues and eigenvectors:

python
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)
Aufgabe

Swipe to start coding

Sort the resulting principal components (eigenvectors) in descending order of their value using the ind list (indices of sorted results) and print output.

Lösung

# Importing libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Reading and standardizing the data
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/7b22c447-77ad-48ae-a2d2-4e6714f7a4a6/train.csv')
X_scaled = StandardScaler().fit_transform(df)
# Calculating covariance matrix, eigenvalues, and eigenvectors
cov_mat = np.cov(X_scaled, rowvar = False)
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)

# Sort obtained results in descending order
ind = np.arange(0, len(eigen_values), 1)
ind = ([x for _, x in sorted(zip(eigen_values, ind))])[::-1]
filter_eigen_values = eigen_values[ind]
filter_eigen_vectors = eigen_vectors[:, ind]

# Displaying the results
print("Sorted eigenvectors ", filter_eigen_vectors)
print("Sorted eigenvalues ", filter_eigen_values)

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 3
# Importing libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Reading and standardizing the data
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/7b22c447-77ad-48ae-a2d2-4e6714f7a4a6/train.csv')
X_scaled = StandardScaler().fit_transform(df)
# Calculating covariance matrix, eigenvalues, and eigenvectors
cov_mat = np.cov(X_scaled, rowvar = False)
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)

# Sort obtained results in descending order
ind = np.arange(0, len(eigen_values), 1)
ind = ([x for _, x in sorted(zip(eigen_values, ind))])[::-1]
filter_eigen_values = ___[___]
filter_eigen_vectors = ___[___]

# Displaying the results
print("Sorted eigenvectors ", filter_eigen_vectors)
print("Sorted eigenvalues ", filter_eigen_values)

Fragen Sie AI

expand
ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

some-alt