Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Explore Dataset | Model Building
Principal Component Analysis

Veeg om het menu te tonen

book
Explore Dataset

Now we will take a closer look at the creation of a PCA model using the example of one dataset. As a dataset, we will use wine from the scikit-learn set. It contains 13 wine characteristics and 3 classes. It is especially convenient for us because there are no categorical variables in it.

Let's load the dataset:

python

Now let's explore the dataset to understand what data we are working with. Let's convert the numpy array X to a pandas dataframe and check the amount of missing data:

python

To get a complete description of each column (mean, standard deviation, etc.), use the .describe() method:

python

Before loading the dataset into the PCA model, let's process our data. Based on the previous lessons, you may have noticed that an important step is data standardization. We implement this using the StandardScaler() class:

python
Taak

Swipe to start coding

Read the data from the train.csv (from web) file. Remove the "Id" column from the dataset and standardize it.

Oplossing

Switch to desktopSchakel over naar desktop voor praktijkervaringGa verder vanaf waar je bent met een van de onderstaande opties
Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 3. Hoofdstuk 2
Onze excuses dat er iets mis is gegaan. Wat is er gebeurd?

Vraag AI

expand
ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

book
Explore Dataset

Now we will take a closer look at the creation of a PCA model using the example of one dataset. As a dataset, we will use wine from the scikit-learn set. It contains 13 wine characteristics and 3 classes. It is especially convenient for us because there are no categorical variables in it.

Let's load the dataset:

python

Now let's explore the dataset to understand what data we are working with. Let's convert the numpy array X to a pandas dataframe and check the amount of missing data:

python

To get a complete description of each column (mean, standard deviation, etc.), use the .describe() method:

python

Before loading the dataset into the PCA model, let's process our data. Based on the previous lessons, you may have noticed that an important step is data standardization. We implement this using the StandardScaler() class:

python
Taak

Swipe to start coding

Read the data from the train.csv (from web) file. Remove the "Id" column from the dataset and standardize it.

Oplossing

Switch to desktopSchakel over naar desktop voor praktijkervaringGa verder vanaf waar je bent met een van de onderstaande opties
Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 3. Hoofdstuk 2
Switch to desktopSchakel over naar desktop voor praktijkervaringGa verder vanaf waar je bent met een van de onderstaande opties
Onze excuses dat er iets mis is gegaan. Wat is er gebeurd?
some-alt