Fit Data into the Model

Now that our data is ready, let's fit it into the PCA model.

from sklearn.decomposition import PCA

pca_model = PCA(n_components = 2)
X_reduced = pca_model.fit_transform(X)

We have reduced the dimension of the dataset from 13 characteristics to 2! Now we can visualize the resulting components using the seaborn and matplotlib libraries:

import matplotlib.pyplot as plt
import seaborn as sns

sns.scatterplot(X_reduced[:,0], X_reduced[:,1])
plt.xlabel("PC1")
plt.ylabel("PC2")

It is logical, if you have a question, how to check the effectiveness of a particular PCA model. The performance of the PCA can be “counted” in two ways. The first is how much information the resulting components contain. The number of components that we decide to leave will determine how much information will eventually remain from the dataset. For example, let's display the amount of explained variance ratio:

print("Cumulative Variances (Percentage):")
print(pca_model.explained_variance_ratio_.cumsum() * 100)

Above is the result of the PCA model, which contains 13 main components from the wine dataset (i.e. the same number of variables as it was originally). So, you can see that the first component captures 36% of the information, two components capture 55%, three components capture 66%, and so on.

The graph makes it easy to visualize the number of components required to capture varying degrees of data variability:

The second way to evaluate the performance of a PCA model is to check the performance of other machine learning models into which we are going (if we really need to) fit the dataset. We can search for the optimal set of 3 variables - for example, the amount of time the machine learning model runs, the percentage of accuracy of the model, and the numbers of principal components.

Quiz

Why do you think only 3 components in the presented dataset can explain as much as 92% of the data?

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Principal Component Analysis

1. What is Principal Component Analysis

Introduction Practical Application of PCA Mathematical Idea Examples of Real Problems How to Explain the Obtained Results?

2. Basic Concepts of PCA

Standardization Covariance Matrix Eigenvalues and Eigenvectors Feature Vector and Principal Components Seeing the Big Picture

3. Model Building

Scikit-learn for PCA Explore Dataset Fit Data into the Model Challenge

4. Results Analysis

Explain Resulting Components What’s after?Data Compression Noise Reduction Image Compression