Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Realization | Feature Engineering
Data Preprocessing

RealizationRealization

Now you have an idea of what feature engineering includes. Let's move on to practical implementation and look at the full pipeline in action.

In this example, we will demonstrate all the pipelines in one program for data preprocessing using the famous iris dataset. We will prepare the data, extract features, select the most relevant features, create new features, normalize and standardize the features, merge the features, evaluate their quality, realize the features, and integrate them for use in a machine learning model.

  1. Data preparation: we will use the iris dataset from the scikit-learn library, which is already preprocessed and cleaned.
  2. Feature reading: we will use the following features from the dataset: Sepal length, Sepal width, Petal length, Petal width.
  3. Feature selection: we will use the SelectKBest method from scikit-learn to select the top 2 most relevant features based on their mutual information score.
  4. Feature creation: we will create a new feature called 'Sepal to Petal Ratio' by dividing the sepal length by the petal length.
  5. Standardization: we will use the StandardScaler method from scikit-learn to scale the selected features.
  6. Feature merging: we will merge the selected and newly created features into one array.
  7. Feature evaluation: we will evaluate the quality of the features by calculating their correlation coefficients.
    Features with high correlation are more linearly dependent and hence have almost the same effect on the dependent variable. Then, when two features have a high correlation, we can drop one of the two features.
  8. Integration and usage: finally, we will integrate the realized features into a machine-learning model for classification.

Note that there is a difference between feature selection and feature creation: feature selection refers to the process of selecting a subset of the available features in a dataset that is most relevant or informative for a given machine learning task. Feature creation, on the other hand, involves generating new features from the existing ones in order to capture more complex or abstract relationships between them.

What is the difference between feature selection and feature creation?

Select the correct answer

Everything was clear?

Section 5. Chapter 2
course content

Course Content

Data Preprocessing

RealizationRealization

Now you have an idea of what feature engineering includes. Let's move on to practical implementation and look at the full pipeline in action.

In this example, we will demonstrate all the pipelines in one program for data preprocessing using the famous iris dataset. We will prepare the data, extract features, select the most relevant features, create new features, normalize and standardize the features, merge the features, evaluate their quality, realize the features, and integrate them for use in a machine learning model.

  1. Data preparation: we will use the iris dataset from the scikit-learn library, which is already preprocessed and cleaned.
  2. Feature reading: we will use the following features from the dataset: Sepal length, Sepal width, Petal length, Petal width.
  3. Feature selection: we will use the SelectKBest method from scikit-learn to select the top 2 most relevant features based on their mutual information score.
  4. Feature creation: we will create a new feature called 'Sepal to Petal Ratio' by dividing the sepal length by the petal length.
  5. Standardization: we will use the StandardScaler method from scikit-learn to scale the selected features.
  6. Feature merging: we will merge the selected and newly created features into one array.
  7. Feature evaluation: we will evaluate the quality of the features by calculating their correlation coefficients.
    Features with high correlation are more linearly dependent and hence have almost the same effect on the dependent variable. Then, when two features have a high correlation, we can drop one of the two features.
  8. Integration and usage: finally, we will integrate the realized features into a machine-learning model for classification.

Note that there is a difference between feature selection and feature creation: feature selection refers to the process of selecting a subset of the available features in a dataset that is most relevant or informative for a given machine learning task. Feature creation, on the other hand, involves generating new features from the existing ones in order to capture more complex or abstract relationships between them.

What is the difference between feature selection and feature creation?

Select the correct answer

Everything was clear?

Section 5. Chapter 2
some-alt