Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Why Scale Features? | Foundations of Feature Scaling
Feature Scaling and Normalization Deep Dive

bookWhy Scale Features?

Machine learning algorithms often rely on optimization techniques such as gradient descent to minimize a cost function and find the best parameters for a model. The scale of your input features can have a significant impact on how efficiently these algorithms converge. When features have vastly different ranges, the optimization landscape becomes skewed, causing the algorithm to take inefficient paths towards the minimum. This can slow down convergence or even prevent the optimizer from finding the best solution. For example, if one feature ranges from 1 to 1000 while another ranges from 0 to 1, the larger-scale feature will dominate the updates, making it difficult for the optimizer to properly adjust parameters associated with the smaller-scale feature. As a result, unscaled features can hinder learning and lead to suboptimal model performance.

Note
Definition

Feature scaling is the process of transforming input variables so that they share a common scale, without distorting differences in the ranges of values. This is a crucial step in many machine learning workflows, as it ensures that all features contribute equally to the learning process and helps optimization algorithms perform efficiently.

1234567891011121314151617181920212223242526272829303132333435363738
import numpy as np import matplotlib.pyplot as plt # Simulate a simple cost function: J(w1, w2) = (w1 * x1 + w2 * x2 - y)^2 x1 = np.array([100, 200, 300, 400]) x2 = np.array([1, 2, 3, 4]) y = np.array([10, 20, 30, 40]) # Unscaled features def cost_unscaled(w1, w2): preds = w1 * x1 + w2 * x2 return np.mean((preds - y) ** 2) # Scaled features (mean 0, std 1) x1_scaled = (x1 - x1.mean()) / x1.std() x2_scaled = (x2 - x2.mean()) / x2.std() def cost_scaled(w1, w2): preds = w1 * x1_scaled + w2 * x2_scaled return np.mean((preds - y) ** 2) # Visualize cost contours W1, W2 = np.meshgrid(np.linspace(-1, 1, 100), np.linspace(-1, 1, 100)) costs_unscaled = np.array([[cost_unscaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) costs_scaled = np.array([[cost_scaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) fig, axs = plt.subplots(1, 2, figsize=(12, 5)) axs[0].contourf(W1, W2, costs_unscaled, levels=50, cmap="viridis") axs[0].set_title("Unscaled Features: Cost Contours") axs[0].set_xlabel("w1") axs[0].set_ylabel("w2") axs[1].contourf(W1, W2, costs_scaled, levels=50, cmap="viridis") axs[1].set_title("Scaled Features: Cost Contours") axs[1].set_xlabel("w1") axs[1].set_ylabel("w2") plt.tight_layout() plt.show()
copy

The contour plots above illustrate how feature scaling affects the optimization landscape. The left plot, using unscaled features, shows elongated contours. This indicates a skewed landscape where gradient descent may take inefficient, zigzagging paths. The right plot, with scaled features, displays more circular contours. This better-conditioned shape allows gradient descent to move directly toward the minimum, improving convergence speed and efficiency.

question mark

Which of the following best describes the impact of unscaled features on optimization algorithms like gradient descent?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Awesome!

Completion rate improved to 5.26

bookWhy Scale Features?

Veeg om het menu te tonen

Machine learning algorithms often rely on optimization techniques such as gradient descent to minimize a cost function and find the best parameters for a model. The scale of your input features can have a significant impact on how efficiently these algorithms converge. When features have vastly different ranges, the optimization landscape becomes skewed, causing the algorithm to take inefficient paths towards the minimum. This can slow down convergence or even prevent the optimizer from finding the best solution. For example, if one feature ranges from 1 to 1000 while another ranges from 0 to 1, the larger-scale feature will dominate the updates, making it difficult for the optimizer to properly adjust parameters associated with the smaller-scale feature. As a result, unscaled features can hinder learning and lead to suboptimal model performance.

Note
Definition

Feature scaling is the process of transforming input variables so that they share a common scale, without distorting differences in the ranges of values. This is a crucial step in many machine learning workflows, as it ensures that all features contribute equally to the learning process and helps optimization algorithms perform efficiently.

1234567891011121314151617181920212223242526272829303132333435363738
import numpy as np import matplotlib.pyplot as plt # Simulate a simple cost function: J(w1, w2) = (w1 * x1 + w2 * x2 - y)^2 x1 = np.array([100, 200, 300, 400]) x2 = np.array([1, 2, 3, 4]) y = np.array([10, 20, 30, 40]) # Unscaled features def cost_unscaled(w1, w2): preds = w1 * x1 + w2 * x2 return np.mean((preds - y) ** 2) # Scaled features (mean 0, std 1) x1_scaled = (x1 - x1.mean()) / x1.std() x2_scaled = (x2 - x2.mean()) / x2.std() def cost_scaled(w1, w2): preds = w1 * x1_scaled + w2 * x2_scaled return np.mean((preds - y) ** 2) # Visualize cost contours W1, W2 = np.meshgrid(np.linspace(-1, 1, 100), np.linspace(-1, 1, 100)) costs_unscaled = np.array([[cost_unscaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) costs_scaled = np.array([[cost_scaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) fig, axs = plt.subplots(1, 2, figsize=(12, 5)) axs[0].contourf(W1, W2, costs_unscaled, levels=50, cmap="viridis") axs[0].set_title("Unscaled Features: Cost Contours") axs[0].set_xlabel("w1") axs[0].set_ylabel("w2") axs[1].contourf(W1, W2, costs_scaled, levels=50, cmap="viridis") axs[1].set_title("Scaled Features: Cost Contours") axs[1].set_xlabel("w1") axs[1].set_ylabel("w2") plt.tight_layout() plt.show()
copy

The contour plots above illustrate how feature scaling affects the optimization landscape. The left plot, using unscaled features, shows elongated contours. This indicates a skewed landscape where gradient descent may take inefficient, zigzagging paths. The right plot, with scaled features, displays more circular contours. This better-conditioned shape allows gradient descent to move directly toward the minimum, improving convergence speed and efficiency.

question mark

Which of the following best describes the impact of unscaled features on optimization algorithms like gradient descent?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1
some-alt