Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Examples of Real Problems | What is Principal Component Analysis
Principal Component Analysis

bookExamples of Real Problems

Let's look at a real-life example of the application of the PCA method. Import the libraries with which we will work:

# Linear algebra and data processing
import numpy as np 
import pandas as pd 
from sklearn.preprocessing import StandardScaler

# PCA model
from sklearn.decomposition import PCA

# Data visualization
import seaborn as sns
import matplotlib.pyplot as plt

Next, we read the train.csv file (from web), which contains data on house sales with the characteristics of houses and their prices:

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/7b22c447-77ad-48ae-a2d2-4e6714f7a4a6/train_S1.csv')

Let's process our data. This process includes dropping many characteristics from the dataset (we will leave only 10 variables - this way it will be easier for us to work with the results obtained so that there are not too many characteristics), as well as data scaling:

# Columns that will remain 
columns_ndrop = ['YearBuilt', 'LotArea', 'MSSubClass', 'OverallQual',  'SalePrice', 'PoolArea', 'GarageArea', 'BedroomAbvGr', 'KitchenAbvGr', 'Fireplaces']
data = data.drop(data.columns.difference(columns_ndrop), 1)

data_sc = StandardScaler().fit_transform(data)

Let's create a PCA model:

pca = PCA(n_components = 3, whiten = True)
pca = pca.fit(data_sc)

Now, to explain the results obtained, we will create a heat map of the factor loading. In the next section, we will learn why we need it.

factor_analysis = pca.components_.T * np.sqrt(pca.explained_variance_)
fig, ax = plt.subplots(figsize=(3, 20))

sns.heatmap(factor_analysis, xticklabels = ["C1", "C2", "C3"], 
      yticklabels = data_sc.columns, annot = True, 
      cmap = "YlGnBu")
plt.show()

In just a couple of steps, we reduced the dimension of the dataset from 10 characteristics to 3! In the next chapter, we will try to interpret the results of PCA.

Oppgave

Swipe to start coding

Read the train.csv dataset (from web) and create a PCA model for it. There should be 4 main components.

Løsning

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 4
single

single

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Oppsummer dette kapittelet

Explain code

Explain why doesn't solve task

close

Awesome!

Completion rate improved to 5.26

bookExamples of Real Problems

Sveip for å vise menyen

Let's look at a real-life example of the application of the PCA method. Import the libraries with which we will work:

# Linear algebra and data processing
import numpy as np 
import pandas as pd 
from sklearn.preprocessing import StandardScaler

# PCA model
from sklearn.decomposition import PCA

# Data visualization
import seaborn as sns
import matplotlib.pyplot as plt

Next, we read the train.csv file (from web), which contains data on house sales with the characteristics of houses and their prices:

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/7b22c447-77ad-48ae-a2d2-4e6714f7a4a6/train_S1.csv')

Let's process our data. This process includes dropping many characteristics from the dataset (we will leave only 10 variables - this way it will be easier for us to work with the results obtained so that there are not too many characteristics), as well as data scaling:

# Columns that will remain 
columns_ndrop = ['YearBuilt', 'LotArea', 'MSSubClass', 'OverallQual',  'SalePrice', 'PoolArea', 'GarageArea', 'BedroomAbvGr', 'KitchenAbvGr', 'Fireplaces']
data = data.drop(data.columns.difference(columns_ndrop), 1)

data_sc = StandardScaler().fit_transform(data)

Let's create a PCA model:

pca = PCA(n_components = 3, whiten = True)
pca = pca.fit(data_sc)

Now, to explain the results obtained, we will create a heat map of the factor loading. In the next section, we will learn why we need it.

factor_analysis = pca.components_.T * np.sqrt(pca.explained_variance_)
fig, ax = plt.subplots(figsize=(3, 20))

sns.heatmap(factor_analysis, xticklabels = ["C1", "C2", "C3"], 
      yticklabels = data_sc.columns, annot = True, 
      cmap = "YlGnBu")
plt.show()

In just a couple of steps, we reduced the dimension of the dataset from 10 characteristics to 3! In the next chapter, we will try to interpret the results of PCA.

Oppgave

Swipe to start coding

Read the train.csv dataset (from web) and create a PCA model for it. There should be 4 main components.

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 4
single

single

some-alt