Вивчайте L1, L2, and Max Normalization: Formulas and Intuition

When working with feature vectors in machine learning, you often need to bring them onto a common scale. Three popular normalization techniques are L1 normalization, L2 normalization, and Max normalization. Each method rescales a vector using a different mathematical norm, which affects both the magnitude and the geometric interpretation of the resulting vector.

L1 normalization—also called Manhattan or taxicab normalization—rescales a vector so that the sum of the absolute values of its elements equals 1. The formula for L1 normalization of a vector $x$ is:

x_{norm} = \frac{x}{||x||_1}

where $||x||_{\raisebox{-1pt}{$1$}}$ is the L1 norm, calculated as the sum of absolute values of the components of $x$ . Geometrically, L1 normalization projects vectors onto the surface of a diamond-shaped region (an L1 unit ball) in the feature space.

L2 normalization, or Euclidean normalization, scales a vector so that the sum of the squares of its elements equals 1. The formula is:

x_{norm} = \frac{x}{||x||_2}

where $||x||_{\raisebox{-1pt}{$2$}}$ is the L2 norm, computed as the square root of the sum of squares of the vector's elements. Geometrically, L2 normalization projects vectors onto the surface of a hypersphere (an L2 unit ball) in the feature space.

Max normalization scales a vector so that the largest absolute value among its elements becomes 1. The formula is:

x_{norm} = \frac{x}{||x||_∞}

where $||x||_{\raisebox{-1pt}{$∞$}}$ (the infinity norm) is the maximum absolute value of the vector's elements. This projects vectors onto the surface of a hypercube in the feature space.

Each normalization method has a distinct geometric interpretation and impact on your data. L1 normalization emphasizes sparsity—many features may become exactly zero. L2 normalization preserves direction but ensures all vectors have unit length, which is especially useful for algorithms that rely on dot products or distances. Max normalization is robust to outliers in all but the largest feature, and is sometimes used when you want to cap all features at the same maximum scale.

Note

Comparison of normalization types:

L1 normalization is preferred when you want to emphasize sparsity or when features are expected to have many zero values;
L2 normalization is typically used when you care about preserving the direction of vectors and when algorithms are sensitive to the magnitude of features (such as k-nearest neighbors or SVMs);
Max normalization is useful when you want to ensure that no feature dominates due to its scale, or when you want all values to be within a fixed range regardless of distribution.


              1234567891011121314151617181920
            
import numpy as np

x = np.array([3, 4, -2])

# L1 normalization
l1_norm = np.sum(np.abs(x))
x_l1 = x / l1_norm

# L2 normalization
l2_norm = np.sqrt(np.sum(x ** 2))
x_l2 = x / l2_norm

# Max normalization
max_norm = np.max(np.abs(x))
x_max = x / max_norm

print("Original vector:", x)
print("L1 normalized:", x_l1)
print("L2 normalized:", x_l2)
print("Max normalized:", x_max)

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 1

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Свайпніть щоб показати меню

x_{norm} = \frac{x}{||x||_1}