Course Content
Clustering Demystified
Feature Scaling
Feature scaling is a technique used to standardize the range of independent variables or features of data. In machine learning, it is a step of data pre-processing that aims to normalize the data dimensions so that they are on a similar scale. This is important because many machine learning algorithms use some form of distance measure, such as Euclidean distance, to compare observations. If the scale of the data is not consistent, certain features will have a much larger influence on the distance measure than others, which can lead to poor performance in some machine learning algorithms.
There are different ways to perform feature scalings, such as normalization, standardization, and Min-Max scaling.
- Min-Max scaling scales the data to a given range, usually between 0 and 1;
- Standardization scales the data so that it has a mean of 0 and a standard deviation of 1;
- Normalization scales the data so that it has a minimum value of 0 and a maximum value of 1.
It's important to note that the feature scaling should be done only on the independent variable(s) and not on the dependent variable.
Methods description
-
MinMaxScaler
: MinMaxScaler is a class within thesklearn.preprocessing
module. It scales features to a specified range, typically between 0 and 1, by subtracting the minimum and dividing by the difference between the maximum and minimum values; -
X.columns
: Assuming X is a DataFrame,X.columns
returns the column labels of the DataFrame X; -
MinMaxScaler.fit_transform(X)
: This method fits the scaler to the data and transforms the data simultaneously. It computes the minimum and maximum values of the data and then scales the data accordingly.
Swipe to show code editor
- Import the
MinMaxScaler
module. - Create the instance of
MinMaxScaler()
. - Create a new DataFrame with the scaled columns.
Thanks for your feedback!
Feature scaling is a technique used to standardize the range of independent variables or features of data. In machine learning, it is a step of data pre-processing that aims to normalize the data dimensions so that they are on a similar scale. This is important because many machine learning algorithms use some form of distance measure, such as Euclidean distance, to compare observations. If the scale of the data is not consistent, certain features will have a much larger influence on the distance measure than others, which can lead to poor performance in some machine learning algorithms.
There are different ways to perform feature scalings, such as normalization, standardization, and Min-Max scaling.
- Min-Max scaling scales the data to a given range, usually between 0 and 1;
- Standardization scales the data so that it has a mean of 0 and a standard deviation of 1;
- Normalization scales the data so that it has a minimum value of 0 and a maximum value of 1.
It's important to note that the feature scaling should be done only on the independent variable(s) and not on the dependent variable.
Methods description
-
MinMaxScaler
: MinMaxScaler is a class within thesklearn.preprocessing
module. It scales features to a specified range, typically between 0 and 1, by subtracting the minimum and dividing by the difference between the maximum and minimum values; -
X.columns
: Assuming X is a DataFrame,X.columns
returns the column labels of the DataFrame X; -
MinMaxScaler.fit_transform(X)
: This method fits the scaler to the data and transforms the data simultaneously. It computes the minimum and maximum values of the data and then scales the data accordingly.
Swipe to show code editor
- Import the
MinMaxScaler
module. - Create the instance of
MinMaxScaler()
. - Create a new DataFrame with the scaled columns.