Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Scaling and Distance-Based Models | Scaling and Model Performance
Feature Scaling and Normalization Deep Dive

bookScaling and Distance-Based Models

Distance-based machine learning models, such as k-nearest neighbors (k-NN) and clustering algorithms like k-means, rely on mathematical measures of distance to compare data points. The way features are scaled has a direct and significant impact on how these distances are computed. If one feature has a much larger range or variance than others, it will dominate the distance calculation, causing models to become biased toward that feature and potentially reducing predictive accuracy. For example, if you have two features — height in centimeters (ranging from 150 to 200) and weight in kilograms (ranging from 50 to 100) — the difference in numerical scale can cause the model to consider height much more important than weight, even if both are equally relevant.

Note
Definition

Euclidean distance is the straight-line distance between two points in Euclidean space. It is calculated as the square root of the sum of squared differences between corresponding feature values. Euclidean distance is highly sensitive to the scale of each feature: features with larger ranges contribute more to the total distance, which can distort model behavior if features are not properly scaled.

123456789101112131415161718
import numpy as np from sklearn.preprocessing import StandardScaler # Two data points with features on different scales point1 = np.array([170, 65]) # [height in cm, weight in kg] point2 = np.array([180, 85]) # Euclidean distance before scaling distance_before = np.linalg.norm(point1 - point2) print("Distance before scaling:", distance_before) # Apply standard scaling (z-score normalization) scaler = StandardScaler() points_scaled = scaler.fit_transform(np.vstack([point1, point2])) # Euclidean distance after scaling distance_after = np.linalg.norm(points_scaled[0] - points_scaled[1]) print("Distance after scaling:", distance_after)
copy
question mark

Why is feature scaling important for distance-based models like k-NN and k-means clustering?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 4. Kapitel 2

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Awesome!

Completion rate improved to 5.26

bookScaling and Distance-Based Models

Stryg for at vise menuen

Distance-based machine learning models, such as k-nearest neighbors (k-NN) and clustering algorithms like k-means, rely on mathematical measures of distance to compare data points. The way features are scaled has a direct and significant impact on how these distances are computed. If one feature has a much larger range or variance than others, it will dominate the distance calculation, causing models to become biased toward that feature and potentially reducing predictive accuracy. For example, if you have two features — height in centimeters (ranging from 150 to 200) and weight in kilograms (ranging from 50 to 100) — the difference in numerical scale can cause the model to consider height much more important than weight, even if both are equally relevant.

Note
Definition

Euclidean distance is the straight-line distance between two points in Euclidean space. It is calculated as the square root of the sum of squared differences between corresponding feature values. Euclidean distance is highly sensitive to the scale of each feature: features with larger ranges contribute more to the total distance, which can distort model behavior if features are not properly scaled.

123456789101112131415161718
import numpy as np from sklearn.preprocessing import StandardScaler # Two data points with features on different scales point1 = np.array([170, 65]) # [height in cm, weight in kg] point2 = np.array([180, 85]) # Euclidean distance before scaling distance_before = np.linalg.norm(point1 - point2) print("Distance before scaling:", distance_before) # Apply standard scaling (z-score normalization) scaler = StandardScaler() points_scaled = scaler.fit_transform(np.vstack([point1, point2])) # Euclidean distance after scaling distance_after = np.linalg.norm(points_scaled[0] - points_scaled[1]) print("Distance after scaling:", distance_after)
copy
question mark

Why is feature scaling important for distance-based models like k-NN and k-means clustering?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 4. Kapitel 2
some-alt