Свайпніть щоб показати меню

Досліджуємо набір даних

Now we will take a closer look at the creation of a PCA model using the example of one dataset. As a dataset, we will use wine from the scikit-learn set. It contains 13 wine characteristics and 3 classes. It is especially convenient for us because there are no categorical variables in it.

Let's load the dataset:

# Importing library
from sklearn.datasets import load_wine

# Reading the dataset
data = load_wine()
X = data.data

Now let's explore the dataset to understand what data we are working with. Let's convert the numpy array X to a pandas dataframe and check the amount of missing data:

# Importing library
import pandas as pd

# Checking for missing data
df = pd.DataFrame(X, columns = data.feature_names)
(df.isnull() | df.empty | df.isna()).sum()

To get a complete description of each column (mean, standard deviation, etc.), use the .describe() method:

df.describe()

Before loading the dataset into the PCA model, let's process our data. Based on the previous lessons, you may have noticed that an important step is data standardization. We implement this using the StandardScaler() class:

# Importing class
from sklearn.preprocessing import StandardScaler

# Standardization
X_scaled = StandardScaler().fit_transform(X)

Завдання

Swipe to start coding

Read the data from the train.csv (from web) file. Remove the "Id" column from the dataset and standardize it.

Рішення

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 3. Розділ 2

single

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Досліджуємо набір даних

Let's load the dataset:

# Importing library
from sklearn.datasets import load_wine

# Reading the dataset
data = load_wine()
X = data.data

Now let's explore the dataset to understand what data we are working with. Let's convert the numpy array X to a pandas dataframe and check the amount of missing data:

# Importing library
import pandas as pd

# Checking for missing data
df = pd.DataFrame(X, columns = data.feature_names)
(df.isnull() | df.empty | df.isna()).sum()

To get a complete description of each column (mean, standard deviation, etc.), use the .describe() method:

df.describe()

# Importing class
from sklearn.preprocessing import StandardScaler

# Standardization
X_scaled = StandardScaler().fit_transform(X)

Завдання

Swipe to start coding

Read the data from the train.csv (from web) file. Remove the "Id" column from the dataset and standardize it.

Рішення

Все було зрозуміло?

Дякуємо за ваш відгук!

Свайпніть щоб показати меню

Досліджуємо набір даних

Рішення

Awesome!

Досліджуємо набір даних

Рішення

Awesome!