Why Scale Features?
Machine learning algorithms often rely on optimization techniques such as gradient descent to minimize a cost function and find the best parameters for a model. The scale of your input features can have a significant impact on how efficiently these algorithms converge. When features have vastly different ranges, the optimization landscape becomes skewed, causing the algorithm to take inefficient paths towards the minimum. This can slow down convergence or even prevent the optimizer from finding the best solution. For example, if one feature ranges from 1 to 1000 while another ranges from 0 to 1, the larger-scale feature will dominate the updates, making it difficult for the optimizer to properly adjust parameters associated with the smaller-scale feature. As a result, unscaled features can hinder learning and lead to suboptimal model performance.
Feature scaling is the process of transforming input variables so that they share a common scale, without distorting differences in the ranges of values. This is a crucial step in many machine learning workflows, as it ensures that all features contribute equally to the learning process and helps optimization algorithms perform efficiently.
1234567891011121314151617181920212223242526272829303132333435363738import numpy as np import matplotlib.pyplot as plt # Simulate a simple cost function: J(w1, w2) = (w1 * x1 + w2 * x2 - y)^2 x1 = np.array([100, 200, 300, 400]) x2 = np.array([1, 2, 3, 4]) y = np.array([10, 20, 30, 40]) # Unscaled features def cost_unscaled(w1, w2): preds = w1 * x1 + w2 * x2 return np.mean((preds - y) ** 2) # Scaled features (mean 0, std 1) x1_scaled = (x1 - x1.mean()) / x1.std() x2_scaled = (x2 - x2.mean()) / x2.std() def cost_scaled(w1, w2): preds = w1 * x1_scaled + w2 * x2_scaled return np.mean((preds - y) ** 2) # Visualize cost contours W1, W2 = np.meshgrid(np.linspace(-1, 1, 100), np.linspace(-1, 1, 100)) costs_unscaled = np.array([[cost_unscaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) costs_scaled = np.array([[cost_scaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) fig, axs = plt.subplots(1, 2, figsize=(12, 5)) axs[0].contourf(W1, W2, costs_unscaled, levels=50, cmap="viridis") axs[0].set_title("Unscaled Features: Cost Contours") axs[0].set_xlabel("w1") axs[0].set_ylabel("w2") axs[1].contourf(W1, W2, costs_scaled, levels=50, cmap="viridis") axs[1].set_title("Scaled Features: Cost Contours") axs[1].set_xlabel("w1") axs[1].set_ylabel("w2") plt.tight_layout() plt.show()
The contour plots above illustrate how feature scaling affects the optimization landscape. The left plot, using unscaled features, shows elongated contours. This indicates a skewed landscape where gradient descent may take inefficient, zigzagging paths. The right plot, with scaled features, displays more circular contours. This better-conditioned shape allows gradient descent to move directly toward the minimum, improving convergence speed and efficiency.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 5.26
Why Scale Features?
Swipe to show menu
Machine learning algorithms often rely on optimization techniques such as gradient descent to minimize a cost function and find the best parameters for a model. The scale of your input features can have a significant impact on how efficiently these algorithms converge. When features have vastly different ranges, the optimization landscape becomes skewed, causing the algorithm to take inefficient paths towards the minimum. This can slow down convergence or even prevent the optimizer from finding the best solution. For example, if one feature ranges from 1 to 1000 while another ranges from 0 to 1, the larger-scale feature will dominate the updates, making it difficult for the optimizer to properly adjust parameters associated with the smaller-scale feature. As a result, unscaled features can hinder learning and lead to suboptimal model performance.
Feature scaling is the process of transforming input variables so that they share a common scale, without distorting differences in the ranges of values. This is a crucial step in many machine learning workflows, as it ensures that all features contribute equally to the learning process and helps optimization algorithms perform efficiently.
1234567891011121314151617181920212223242526272829303132333435363738import numpy as np import matplotlib.pyplot as plt # Simulate a simple cost function: J(w1, w2) = (w1 * x1 + w2 * x2 - y)^2 x1 = np.array([100, 200, 300, 400]) x2 = np.array([1, 2, 3, 4]) y = np.array([10, 20, 30, 40]) # Unscaled features def cost_unscaled(w1, w2): preds = w1 * x1 + w2 * x2 return np.mean((preds - y) ** 2) # Scaled features (mean 0, std 1) x1_scaled = (x1 - x1.mean()) / x1.std() x2_scaled = (x2 - x2.mean()) / x2.std() def cost_scaled(w1, w2): preds = w1 * x1_scaled + w2 * x2_scaled return np.mean((preds - y) ** 2) # Visualize cost contours W1, W2 = np.meshgrid(np.linspace(-1, 1, 100), np.linspace(-1, 1, 100)) costs_unscaled = np.array([[cost_unscaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) costs_scaled = np.array([[cost_scaled(w1, w2) for w1, w2 in zip(row1, row2)] for row1, row2 in zip(W1, W2)]) fig, axs = plt.subplots(1, 2, figsize=(12, 5)) axs[0].contourf(W1, W2, costs_unscaled, levels=50, cmap="viridis") axs[0].set_title("Unscaled Features: Cost Contours") axs[0].set_xlabel("w1") axs[0].set_ylabel("w2") axs[1].contourf(W1, W2, costs_scaled, levels=50, cmap="viridis") axs[1].set_title("Scaled Features: Cost Contours") axs[1].set_xlabel("w1") axs[1].set_ylabel("w2") plt.tight_layout() plt.show()
The contour plots above illustrate how feature scaling affects the optimization landscape. The left plot, using unscaled features, shows elongated contours. This indicates a skewed landscape where gradient descent may take inefficient, zigzagging paths. The right plot, with scaled features, displays more circular contours. This better-conditioned shape allows gradient descent to move directly toward the minimum, improving convergence speed and efficiency.
Thanks for your feedback!