Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Methods for Encoding the Categorical Data | Processing Categorical Data
course content

Course Content

Data Preprocessing

Methods for Encoding the Categorical DataMethods for Encoding the Categorical Data

Categorical data is a type of data that represents qualitative or descriptive characteristics. It is often non-numeric. These can be car brands, professions, education level, etc. But then, what is the difference between plain text data and categorical data? The main difference between categorical data and text data - is that categorical data is a structured type of data with discrete categories, while text data - is an unstructured type of data that requires additional preprocessing steps to extract relevant information. That is why, for example, the names of people in a dataset with user resumes are not categorical data but text data.

First of all, let's find out why we need to encode categorical data. Most machine learning algorithms require numeric input data to be able to perform their computations, so categorical data needs to be transformed into a numerical representation before it can be used.

There are many data encoding methods: label encoding, one-hot encoding, binary encoding, target encoding, and others, the differences between which we will discuss in the following chapters.

You can see the difference between one-hot encoding and label encoding in the images below:

Everything was clear?

Section 3. Chapter 1
course content

Course Content

Data Preprocessing

Methods for Encoding the Categorical DataMethods for Encoding the Categorical Data

Categorical data is a type of data that represents qualitative or descriptive characteristics. It is often non-numeric. These can be car brands, professions, education level, etc. But then, what is the difference between plain text data and categorical data? The main difference between categorical data and text data - is that categorical data is a structured type of data with discrete categories, while text data - is an unstructured type of data that requires additional preprocessing steps to extract relevant information. That is why, for example, the names of people in a dataset with user resumes are not categorical data but text data.

First of all, let's find out why we need to encode categorical data. Most machine learning algorithms require numeric input data to be able to perform their computations, so categorical data needs to be transformed into a numerical representation before it can be used.

There are many data encoding methods: label encoding, one-hot encoding, binary encoding, target encoding, and others, the differences between which we will discuss in the following chapters.

You can see the difference between one-hot encoding and label encoding in the images below:

Everything was clear?

Section 3. Chapter 1
some-alt