Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Challenge 1 | Moving on to Tasks
Data Preprocessing

book
Challenge 1

Compito

Swipe to start coding

In this challenge, you will need to work with the 'adult-census.csv' dataset. It contains both categorical and numerical data. Your task will be to prepare the data for processing.

  1. Read the dataset 'adult-census.csv'
  2. Explore the dataset. Carefully check which character indicates the missed data in the dataset and replace it with the np.nan object
  3. Remove rows with missing values
  4. Let's start with processing categorical data - columns 'workclass', 'sex' Use a one-hot encoding method to encode them
  5. For numeric data ('age', 'hours-per-week'), you will need to scale the data
  6. Print processed data

Soluzione

from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np
import pandas as pd

# Read the dataset
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/adult-census.csv')

# Replace the symbol of missed data with real np.nan object
df = df.replace(' ?', np.nan)

# Drop all rows with nan values
df = df.dropna()

# Create MinMaxScaler
scaler = MinMaxScaler()

# Make one-hot encoding with pd.get_dummies()
one_hot = pd.get_dummies(df[['workclass', 'sex']])
# Join the encoded columns
df = df.join(one_hot)
# Drop initial columns
df = df.drop(['workclass', 'sex'], axis=1)

# Fit and transform numerical data for the scaler
df[['age', 'hours-per-week']] = scaler.fit_transform(df[['age', 'hours-per-week']])

# Print new data
print(df)

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 6. Capitolo 1
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np
import pandas as pd

# Read the dataset
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/adult-census.csv')

# Replace the symbol of missed data with real np.nan object
df = df.___(___, np.nan)

# Drop all rows with nan values
df = df.___()

# Create MinMaxScaler
scaler = MinMaxScaler()

# Make one-hot encoding with pd.get_dummies()
one_hot = pd.___(df[['workclass', 'sex']])
# Join the encoded columns
df = df.___(one_hot)
# Drop initial columns
df = df.___(['workclass', 'sex'], axis=1)

# Fit and transform numerical data for the scaler
df[['age', 'hours-per-week']] = scaler.___(df[['age', 'hours-per-week']])

# Print new data
print(df)

Chieda ad AI

expand
ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

some-alt