course content

Course Content

Principal Component Analysis

Fit Data into the ModelFit Data into the Model

Now that our data is ready, let's fit it into the PCA model.

We have reduced the dimension of the dataset from 13 characteristics to 2! Now we can visualize the resulting components using the seaborn and matplotlib libraries:


It is logical, if you have a question, how to check the effectiveness of a particular PCA model. The performance of the PCA can be “counted” in two ways. The first is how much information the resulting components contain. The number of components that we decide to leave will determine how much information will eventually remain from the dataset. For example, let's display the amount of explained variance ratio:


Above is the result of the PCA model, which contains 13 main components from the wine dataset (i.e. the same number of variables as it was originally). So, you can see that the first component captures 36% of the information, two components capture 55%, three components capture 66%, and so on.

The graph makes it easy to visualize the number of components required to capture varying degrees of data variability:


The second way to evaluate the performance of a PCA model is to check the performance of other machine learning models into which we are going (if we really need to) fit the dataset. We can search for the optimal set of 3 variables - for example, the amount of time the machine learning model runs, the percentage of accuracy of the model, and the numbers of principal components.


Why do you think only 3 components in the presented dataset can explain as much as 92% of the data?



Choose the correct option.

Select the correct answer

Everything was clear?

Section 3. Chapter 3