Learn Why Scale Features? | Foundations of Feature Scaling

Machine learning algorithms often rely on optimization techniques such as gradient descent to minimize a cost function and find the best parameters for a model. The scale of your input features can have a significant impact on how efficiently these algorithms converge. When features have vastly different ranges, the optimization landscape becomes skewed, causing the algorithm to take inefficient paths towards the minimum. This can slow down convergence or even prevent the optimizer from finding the best solution. For example, if one feature ranges from 1 to 1000 while another ranges from 0 to 1, the larger-scale feature will dominate the updates, making it difficult for the optimizer to properly adjust parameters associated with the smaller-scale feature. As a result, unscaled features can hinder learning and lead to suboptimal model performance.

Definition

Feature scaling is the process of transforming input variables so that they share a common scale, without distorting differences in the ranges of values. This is a crucial step in many machine learning workflows, as it ensures that all features contribute equally to the learning process and helps optimization algorithms perform efficiently.


              1234567891011121314151617181920212223242526272829303132333435363738
            
import numpy as np
import matplotlib.pyplot as plt

# Simulate a simple cost function: J(w1, w2) = (w1 * x1 + w2 * x2 - y)^2
x1 = np.array([100, 200, 300, 400])
x2 = np.array([1, 2, 3, 4])
y = np.array([10, 20, 30, 40])

# Unscaled features
def cost_unscaled(w1, w2):
    preds = w1 * x1 + w2 * x2
    return np.mean((preds - y) ** 2)

# Scaled features (mean 0, std 1)
x1_scaled = (x1 - x1.mean()) / x1.std()
x2_scaled = (x2 - x2.mean()) / x2.std()
def cost_scaled(w1, w2):
    preds = w1 * x1_scaled + w2 * x2_scaled
    return np.mean((preds - y) ** 2)

# Visualize cost contours
W1, W2 = np.meshgrid(np.linspace(-1, 1, 100), np.linspace(-1, 1, 100))
costs_unscaled = np.array([[cost_unscaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)])
costs_scaled = np.array([[cost_scaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)])

fig, axs = plt.subplots(1, 2, figsize=(12, 5))
axs[0].contourf(W1, W2, costs_unscaled, levels=50, cmap="viridis")
axs[0].set_title("Unscaled Features: Cost Contours")
axs[0].set_xlabel("w1")
axs[0].set_ylabel("w2")

axs[1].contourf(W1, W2, costs_scaled, levels=50, cmap="viridis")
axs[1].set_title("Scaled Features: Cost Contours")
axs[1].set_xlabel("w1")
axs[1].set_ylabel("w2")

plt.tight_layout()
plt.show()

The contour plots above illustrate how feature scaling affects the optimization landscape. The left plot, using unscaled features, shows elongated contours. This indicates a skewed landscape where gradient descent may take inefficient, zigzagging paths. The right plot, with scaled features, displays more circular contours. This better-conditioned shape allows gradient descent to move directly toward the minimum, improving convergence speed and efficiency.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu

Definition


              1234567891011121314151617181920212223242526272829303132333435363738
            
import numpy as np
import matplotlib.pyplot as plt

# Simulate a simple cost function: J(w1, w2) = (w1 * x1 + w2 * x2 - y)^2
x1 = np.array([100, 200, 300, 400])
x2 = np.array([1, 2, 3, 4])
y = np.array([10, 20, 30, 40])

# Unscaled features
def cost_unscaled(w1, w2):
    preds = w1 * x1 + w2 * x2
    return np.mean((preds - y) ** 2)

# Scaled features (mean 0, std 1)
x1_scaled = (x1 - x1.mean()) / x1.std()
x2_scaled = (x2 - x2.mean()) / x2.std()
def cost_scaled(w1, w2):
    preds = w1 * x1_scaled + w2 * x2_scaled
    return np.mean((preds - y) ** 2)

# Visualize cost contours
W1, W2 = np.meshgrid(np.linspace(-1, 1, 100), np.linspace(-1, 1, 100))
costs_unscaled = np.array([[cost_unscaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)])
costs_scaled = np.array([[cost_scaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)])

fig, axs = plt.subplots(1, 2, figsize=(12, 5))
axs[0].contourf(W1, W2, costs_unscaled, levels=50, cmap="viridis")
axs[0].set_title("Unscaled Features: Cost Contours")
axs[0].set_xlabel("w1")
axs[0].set_ylabel("w2")

axs[1].contourf(W1, W2, costs_scaled, levels=50, cmap="viridis")
axs[1].set_title("Scaled Features: Cost Contours")
axs[1].set_xlabel("w1")
axs[1].set_ylabel("w2")

plt.tight_layout()
plt.show()

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1