Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте L1, L2, and Max Normalization: Formulas and Intuition | Normalization Techniques
Feature Scaling and Normalization Deep Dive

bookL1, L2, and Max Normalization: Formulas and Intuition

When working with feature vectors in machine learning, you often need to bring them onto a common scale. Three popular normalization techniques are L1 normalization, L2 normalization, and Max normalization. Each method rescales a vector using a different mathematical norm, which affects both the magnitude and the geometric interpretation of the resulting vector.

L1 normalization—also called Manhattan or taxicab normalization—rescales a vector so that the sum of the absolute values of its elements equals 1. The formula for L1 normalization of a vector xx is:

xnorm=xx1x_{norm} = \frac{x}{||x||_1}

where x1||x||_{\raisebox{-1pt}{$1$}} is the L1 norm, calculated as the sum of absolute values of the components of xx. Geometrically, L1 normalization projects vectors onto the surface of a diamond-shaped region (an L1 unit ball) in the feature space.

L2 normalization, or Euclidean normalization, scales a vector so that the sum of the squares of its elements equals 1. The formula is:

xnorm=xx2x_{norm} = \frac{x}{||x||_2}

where x2||x||_{\raisebox{-1pt}{$2$}} is the L2 norm, computed as the square root of the sum of squares of the vector's elements. Geometrically, L2 normalization projects vectors onto the surface of a hypersphere (an L2 unit ball) in the feature space.

Max normalization scales a vector so that the largest absolute value among its elements becomes 1. The formula is:

xnorm=xxx_{norm} = \frac{x}{||x||_∞}

where x||x||_{\raisebox{-1pt}{$∞$}} (the infinity norm) is the maximum absolute value of the vector's elements. This projects vectors onto the surface of a hypercube in the feature space.

Each normalization method has a distinct geometric interpretation and impact on your data. L1 normalization emphasizes sparsity—many features may become exactly zero. L2 normalization preserves direction but ensures all vectors have unit length, which is especially useful for algorithms that rely on dot products or distances. Max normalization is robust to outliers in all but the largest feature, and is sometimes used when you want to cap all features at the same maximum scale.

Note
Note

Comparison of normalization types:

  • L1 normalization is preferred when you want to emphasize sparsity or when features are expected to have many zero values;
  • L2 normalization is typically used when you care about preserving the direction of vectors and when algorithms are sensitive to the magnitude of features (such as k-nearest neighbors or SVMs);
  • Max normalization is useful when you want to ensure that no feature dominates due to its scale, or when you want all values to be within a fixed range regardless of distribution.
1234567891011121314151617181920
import numpy as np x = np.array([3, 4, -2]) # L1 normalization l1_norm = np.sum(np.abs(x)) x_l1 = x / l1_norm # L2 normalization l2_norm = np.sqrt(np.sum(x ** 2)) x_l2 = x / l2_norm # Max normalization max_norm = np.max(np.abs(x)) x_max = x / max_norm print("Original vector:", x) print("L1 normalized:", x_l1) print("L2 normalized:", x_l2) print("Max normalized:", x_max)
copy
question mark

Which normalization technique would you use if you want to ensure all features contribute equally and are robust to outliers, except for the largest feature?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 1

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Awesome!

Completion rate improved to 5.26

bookL1, L2, and Max Normalization: Formulas and Intuition

Свайпніть щоб показати меню

When working with feature vectors in machine learning, you often need to bring them onto a common scale. Three popular normalization techniques are L1 normalization, L2 normalization, and Max normalization. Each method rescales a vector using a different mathematical norm, which affects both the magnitude and the geometric interpretation of the resulting vector.

L1 normalization—also called Manhattan or taxicab normalization—rescales a vector so that the sum of the absolute values of its elements equals 1. The formula for L1 normalization of a vector xx is:

xnorm=xx1x_{norm} = \frac{x}{||x||_1}

where x1||x||_{\raisebox{-1pt}{$1$}} is the L1 norm, calculated as the sum of absolute values of the components of xx. Geometrically, L1 normalization projects vectors onto the surface of a diamond-shaped region (an L1 unit ball) in the feature space.

L2 normalization, or Euclidean normalization, scales a vector so that the sum of the squares of its elements equals 1. The formula is:

xnorm=xx2x_{norm} = \frac{x}{||x||_2}

where x2||x||_{\raisebox{-1pt}{$2$}} is the L2 norm, computed as the square root of the sum of squares of the vector's elements. Geometrically, L2 normalization projects vectors onto the surface of a hypersphere (an L2 unit ball) in the feature space.

Max normalization scales a vector so that the largest absolute value among its elements becomes 1. The formula is:

xnorm=xxx_{norm} = \frac{x}{||x||_∞}

where x||x||_{\raisebox{-1pt}{$∞$}} (the infinity norm) is the maximum absolute value of the vector's elements. This projects vectors onto the surface of a hypercube in the feature space.

Each normalization method has a distinct geometric interpretation and impact on your data. L1 normalization emphasizes sparsity—many features may become exactly zero. L2 normalization preserves direction but ensures all vectors have unit length, which is especially useful for algorithms that rely on dot products or distances. Max normalization is robust to outliers in all but the largest feature, and is sometimes used when you want to cap all features at the same maximum scale.

Note
Note

Comparison of normalization types:

  • L1 normalization is preferred when you want to emphasize sparsity or when features are expected to have many zero values;
  • L2 normalization is typically used when you care about preserving the direction of vectors and when algorithms are sensitive to the magnitude of features (such as k-nearest neighbors or SVMs);
  • Max normalization is useful when you want to ensure that no feature dominates due to its scale, or when you want all values to be within a fixed range regardless of distribution.
1234567891011121314151617181920
import numpy as np x = np.array([3, 4, -2]) # L1 normalization l1_norm = np.sum(np.abs(x)) x_l1 = x / l1_norm # L2 normalization l2_norm = np.sqrt(np.sum(x ** 2)) x_l2 = x / l2_norm # Max normalization max_norm = np.max(np.abs(x)) x_max = x / max_norm print("Original vector:", x) print("L1 normalized:", x_l1) print("L2 normalized:", x_l2) print("Max normalized:", x_max)
copy
question mark

Which normalization technique would you use if you want to ensure all features contribute equally and are robust to outliers, except for the largest feature?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 1
some-alt