Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Data Standardization | Normalization & Standardization
Preprocessing Data

book
Data Standardization

Another approach to scale the data is to standardize it: transform in such a way that mean is equal to 0 and std is equal to 1. Look at the formula:

Here μ is a mean value of x and σ is a standard deviation. This way, we move each value such that total mean is 0, and normalize it with std, so new std is equal to 1.

Task

Swipe to start coding

Standardize the data in SibSp column by calculating mean and std values and use the formula above. Output some sample to see the modified data and check if the new values of mean and std are equal to 0 and 1 respectively.

Solution

import pandas as pd

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv')
sibsp = data['SibSp']
mean, std = sibsp.mean(), sibsp.std()
sibsp = (sibsp - mean) / std
print(sibsp.sample(10))

new_mean, new_std = sibsp.mean(), sibsp.std()
print(round(new_mean, 2), round(new_std, 2))

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 4. Chapter 2
single

single

import pandas as pd

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv')
sibsp = data['SibSp']
# standardize the sibsp

print(sibsp.sample(10))
# check the new values of mean and std

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt