Challenge 3
Tehtävä
Swipe to start coding
The last task we have prepared for you is the implementation of feature engineering. You will be working with the 'sales_data.csv'
dataset, and your task will be to create new variables and process categorical and numeric data.
- Use feature engineering to create new columns such as year, month, and day of the week
Date
- Encode the
'Region'
and'Product;
categorical columns with the ohe-hot encoding method - For numeric data (
'Sales'
), you will need to scale the data
Ratkaisu
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np
import pandas as pd
# Read the dataset
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/sales_data.csv')
# Convert date column to datetime format
df['Date'] = pd.to_datetime(df['Date'])
# Create new columns based on date column
# 'Year'
# 'Month'
# 'Weekend' - if the day is a weekend, then the value is 1, if not a weekend, then 0
# For 'Weekend' column use .map() function
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Weekend'] = df['Date'].dt.day_name()
df['Weekend'] = df['Weekend'].map({'Monday': 0, 'Tuesday': 0, 'Wednesday': 0, 'Thursday': 0,
'Friday': 0, 'Saturday': 1, 'Sunday': 1,})
# Drop original date column
df = df.drop('Date', axis=1)
# Create MinMaxScaler
scaler = MinMaxScaler()
# Make one-hot encoding with pd.get_dummies()
one_hot = pd.get_dummies(df[['Region', 'Product']])
# Join the encoded columns
df = df.join(one_hot)
# Drop initial columns
df = df.drop(['Region', 'Product'], axis=1)
# Fit and transform numerical data for the scaler
df[['Sales']] = scaler.fit_transform(df[['Sales']])
Oliko kaikki selvää?
Kiitos palautteestasi!
Osio 6. Luku 3
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np
import pandas as pd
# Read the dataset
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/sales_data.csv')
# Convert date column to datetime format
df['Date'] = pd.to_datetime(df['Date'])
# Create new columns based on date column
# 'Year'
# 'Month'
# 'Weekend' - if the day is a weekend, then the value is 1, if not a weekend, then 0
# For 'Weekend' column use .map() function
df['Year'] = df['Date'].dt.___
df['Month'] = df['Date'].dt.___
df['Weekend'] = df['Date'].dt.day_name()
df['Weekend'] = df['Weekend'].___({'Monday': 0, 'Tuesday': 0, 'Wednesday': 0, 'Thursday': 0,
'Friday': 0, 'Saturday': 1, 'Sunday': 1,})
# Drop original date column
df = df.drop('Date', axis=1)
# Create MinMaxScaler
scaler = MinMaxScaler()
# Make one-hot encoding with pd.get_dummies()
one_hot = pd.___(df[['Region', 'Product']])
# Join the encoded columns
df = df.join(one_hot)
# Drop initial columns
df = df.drop(['Region', 'Product'], axis=1)
# Fit and transform numerical data for the scaler
df[['Sales']] = scaler.___(df[['Sales']])
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme