Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Feature Scaling | Clustering Demystified
Clustering Demystified
course content

Course Content

Clustering Demystified

bookFeature Scaling

Feature scaling is a technique used to standardize the range of independent variables or features of data. In machine learning, it is a step of data pre-processing that aims to normalize the data dimensions so that they are on a similar scale. This is important because many machine learning algorithms use some form of distance measure, such as Euclidean distance, to compare observations. If the scale of the data is not consistent, certain features will have a much larger influence on the distance measure than others, which can lead to poor performance in some machine learning algorithms.

There are different ways to perform feature scalings, such as normalization, standardization, and Min-Max scaling.

  • Min-Max scaling scales the data to a given range, usually between 0 and 1;
  • Standardization scales the data so that it has a mean of 0 and a standard deviation of 1;
  • Normalization scales the data so that it has a minimum value of 0 and a maximum value of 1.

It's important to note that the feature scaling should be done only on the independent variable(s) and not on the dependent variable.

Methods description

  • MinMaxScaler: MinMaxScaler is a class within the sklearn.preprocessing module. It scales features to a specified range, typically between 0 and 1, by subtracting the minimum and dividing by the difference between the maximum and minimum values;
  • X.columns: Assuming X is a DataFrame, X.columns returns the column labels of the DataFrame X;
  • MinMaxScaler.fit_transform(X): This method fits the scaler to the data and transforms the data simultaneously. It computes the minimum and maximum values of the data and then scales the data accordingly.
Task
test

Swipe to show code editor

  1. Import the MinMaxScaler module.
  2. Create the instance of MinMaxScaler().
  3. Create a new DataFrame with the scaled columns.

Mark tasks as Completed
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Feature scaling is a technique used to standardize the range of independent variables or features of data. In machine learning, it is a step of data pre-processing that aims to normalize the data dimensions so that they are on a similar scale. This is important because many machine learning algorithms use some form of distance measure, such as Euclidean distance, to compare observations. If the scale of the data is not consistent, certain features will have a much larger influence on the distance measure than others, which can lead to poor performance in some machine learning algorithms.

There are different ways to perform feature scalings, such as normalization, standardization, and Min-Max scaling.

  • Min-Max scaling scales the data to a given range, usually between 0 and 1;
  • Standardization scales the data so that it has a mean of 0 and a standard deviation of 1;
  • Normalization scales the data so that it has a minimum value of 0 and a maximum value of 1.

It's important to note that the feature scaling should be done only on the independent variable(s) and not on the dependent variable.

Methods description

  • MinMaxScaler: MinMaxScaler is a class within the sklearn.preprocessing module. It scales features to a specified range, typically between 0 and 1, by subtracting the minimum and dividing by the difference between the maximum and minimum values;
  • X.columns: Assuming X is a DataFrame, X.columns returns the column labels of the DataFrame X;
  • MinMaxScaler.fit_transform(X): This method fits the scaler to the data and transforms the data simultaneously. It computes the minimum and maximum values of the data and then scales the data accordingly.
Task
test

Swipe to show code editor

  1. Import the MinMaxScaler module.
  2. Create the instance of MinMaxScaler().
  3. Create a new DataFrame with the scaled columns.

Mark tasks as Completed
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Section 1. Chapter 7
AVAILABLE TO ULTIMATE ONLY
We're sorry to hear that something went wrong. What happened?
some-alt