Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Mean-Centering and Z-Score Standardization | Foundations of Feature Scaling
Feature Scaling and Normalization Deep Dive

bookMean-Centering and Z-Score Standardization

To effectively prepare your data for machine learning, you need to understand how to transform features so that they are on comparable scales. Two of the most fundamental techniques for this are mean-centering and z-score standardization. Both are rooted in statistical concepts and have precise mathematical formulas. Let’s break down how each is derived and why they matter.

Derivation of Mean-Centering and Z-Score Standardization Formulas

Suppose you have a set of data points for a single feature, represented as a vector:

x=[x1,x2,...,xn]x = [x₁, x₂, ..., xₙ]

Step 1: Compute the Mean

The first step in both mean-centering and standardization is to calculate the mean of the feature:

μ=1ni=1nxi\mu = \frac{1}{n} \sum_{i=1}^{n} x_i

This value, μμ, represents the average value of your feature.

Step 2: Mean-Centering

Mean-centering involves subtracting the mean from each data point. The formula for the mean-centered value xix'_i is:

xi=xiμx'_i = x_i - \mu

This transformation shifts the entire distribution of the feature so that its mean becomes zero, but it does not change the spread or scale of the data.

Step 3: Standard Deviation

For standardization, you also need the standard deviation, which measures the spread of the data:

σ=1ni=1n(xiμ)2\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}

Step 4: Z-Score Standardization

Z-score standardization, also known as standard scaling, transforms each value by subtracting the mean and then dividing by the standard deviation:

zi=xiμσz_i = \frac{x_i - \mu}{\sigma}

After this transformation, the feature will have a mean of zero and a standard deviation of one.

Each step in these derivations ensures that your data is centered (mean zero) and, if standardized, also scaled (unit variance), which is crucial for many machine learning algorithms that are sensitive to feature scale.

Note
Note

Mean-centering shifts your data so that it revolves around zero, removing any inherent bias in the feature’s location. This is especially helpful when you want to emphasize differences between data points rather than their absolute values. Z-score standardization goes further by also scaling the data so that its spread (variance) is uniform. Use mean-centering when you only need to remove the mean, such as before applying Principal Component Analysis (PCA). Use z-score standardization when your model is sensitive to the scale of the data, like in algorithms that rely on distances or gradients.

123456789101112131415161718
import numpy as np # Create a synthetic dataset (single feature) X = np.array([10, 12, 14, 16, 18], dtype=float) # Mean-centering mean = np.mean(X) X_centered = X - mean # Standard deviation std = np.std(X) # Z-score standardization X_standardized = (X - mean) / std print("Original data:", X) print("Mean-centered data:", X_centered) print("Standardized data:", X_standardized)
copy
question mark

Which statements about mean-centering and z-score standardization are correct

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 2

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Awesome!

Completion rate improved to 5.26

bookMean-Centering and Z-Score Standardization

Sveip for å vise menyen

To effectively prepare your data for machine learning, you need to understand how to transform features so that they are on comparable scales. Two of the most fundamental techniques for this are mean-centering and z-score standardization. Both are rooted in statistical concepts and have precise mathematical formulas. Let’s break down how each is derived and why they matter.

Derivation of Mean-Centering and Z-Score Standardization Formulas

Suppose you have a set of data points for a single feature, represented as a vector:

x=[x1,x2,...,xn]x = [x₁, x₂, ..., xₙ]

Step 1: Compute the Mean

The first step in both mean-centering and standardization is to calculate the mean of the feature:

μ=1ni=1nxi\mu = \frac{1}{n} \sum_{i=1}^{n} x_i

This value, μμ, represents the average value of your feature.

Step 2: Mean-Centering

Mean-centering involves subtracting the mean from each data point. The formula for the mean-centered value xix'_i is:

xi=xiμx'_i = x_i - \mu

This transformation shifts the entire distribution of the feature so that its mean becomes zero, but it does not change the spread or scale of the data.

Step 3: Standard Deviation

For standardization, you also need the standard deviation, which measures the spread of the data:

σ=1ni=1n(xiμ)2\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}

Step 4: Z-Score Standardization

Z-score standardization, also known as standard scaling, transforms each value by subtracting the mean and then dividing by the standard deviation:

zi=xiμσz_i = \frac{x_i - \mu}{\sigma}

After this transformation, the feature will have a mean of zero and a standard deviation of one.

Each step in these derivations ensures that your data is centered (mean zero) and, if standardized, also scaled (unit variance), which is crucial for many machine learning algorithms that are sensitive to feature scale.

Note
Note

Mean-centering shifts your data so that it revolves around zero, removing any inherent bias in the feature’s location. This is especially helpful when you want to emphasize differences between data points rather than their absolute values. Z-score standardization goes further by also scaling the data so that its spread (variance) is uniform. Use mean-centering when you only need to remove the mean, such as before applying Principal Component Analysis (PCA). Use z-score standardization when your model is sensitive to the scale of the data, like in algorithms that rely on distances or gradients.

123456789101112131415161718
import numpy as np # Create a synthetic dataset (single feature) X = np.array([10, 12, 14, 16, 18], dtype=float) # Mean-centering mean = np.mean(X) X_centered = X - mean # Standard deviation std = np.std(X) # Z-score standardization X_standardized = (X - mean) / std print("Original data:", X) print("Mean-centered data:", X_centered) print("Standardized data:", X_standardized)
copy
question mark

Which statements about mean-centering and z-score standardization are correct

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 2
some-alt