Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Challenge 3 | Moving on to Tasks
Data Preprocessing

book
Challenge 3

Tehtävä

Swipe to start coding

The last task we have prepared for you is the implementation of feature engineering. You will be working with the 'sales_data.csv' dataset, and your task will be to create new variables and process categorical and numeric data.

  1. Use feature engineering to create new columns such as year, month, and day of the week Date
  2. Encode the 'Region' and 'Product; categorical columns with the ohe-hot encoding method
  3. For numeric data ('Sales'), you will need to scale the data

Ratkaisu

from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np
import pandas as pd

# Read the dataset
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/sales_data.csv')

# Convert date column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Create new columns based on date column
# 'Year'
# 'Month'
# 'Weekend' - if the day is a weekend, then the value is 1, if not a weekend, then 0
# For 'Weekend' column use .map() function
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Weekend'] = df['Date'].dt.day_name()
df['Weekend'] = df['Weekend'].map({'Monday': 0, 'Tuesday': 0, 'Wednesday': 0, 'Thursday': 0,
'Friday': 0, 'Saturday': 1, 'Sunday': 1,})

# Drop original date column
df = df.drop('Date', axis=1)

# Create MinMaxScaler
scaler = MinMaxScaler()

# Make one-hot encoding with pd.get_dummies()
one_hot = pd.get_dummies(df[['Region', 'Product']])
# Join the encoded columns
df = df.join(one_hot)
# Drop initial columns
df = df.drop(['Region', 'Product'], axis=1)

# Fit and transform numerical data for the scaler
df[['Sales']] = scaler.fit_transform(df[['Sales']])

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 6. Luku 3
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np
import pandas as pd

# Read the dataset
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/sales_data.csv')

# Convert date column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Create new columns based on date column
# 'Year'
# 'Month'
# 'Weekend' - if the day is a weekend, then the value is 1, if not a weekend, then 0
# For 'Weekend' column use .map() function
df['Year'] = df['Date'].dt.___
df['Month'] = df['Date'].dt.___
df['Weekend'] = df['Date'].dt.day_name()
df['Weekend'] = df['Weekend'].___({'Monday': 0, 'Tuesday': 0, 'Wednesday': 0, 'Thursday': 0,
'Friday': 0, 'Saturday': 1, 'Sunday': 1,})

# Drop original date column
df = df.drop('Date', axis=1)

# Create MinMaxScaler
scaler = MinMaxScaler()

# Make one-hot encoding with pd.get_dummies()
one_hot = pd.___(df[['Region', 'Product']])
# Join the encoded columns
df = df.join(one_hot)
# Drop initial columns
df = df.drop(['Region', 'Product'], axis=1)

# Fit and transform numerical data for the scaler
df[['Sales']] = scaler.___(df[['Sales']])

Kysy tekoälyä

expand
ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

some-alt