Зміст курсу

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

Scikit-learn Concepts Getting Familiar with Dataset Dealing with Missing Values Challenge: Imputing Missing Values OrdinalEncoder One-Hot Encoder LabelEncoder Challenge: Encoding Categorical Variables Why Scale the Data?StandardScaler, MinMaxScaler, MaxAbsScaler Challenge: Scaling the Features

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling

Models KNeighborsClassifier Evaluating the Model Cross-Validation Challenge: Evaluating the Model with Cross-Validation GridSearchCV The Flaw of GridSearchCV Challenge: Tuning Hyperparameters with RandomizedSearchCV Modeling Summary Challenge: Putting It All Together

OrdinalEncoder

The next problem we will solve is categorical data. Recall that there are two types of categorical data.

Ordinal data follows some natural order, while nominal does not. Since there is a natural order, we can encode categories to the numbers in that order.

For example, we would encode the 'rate' column containing 'Terrible', 'Bad', 'OK', 'Good', and 'Great' values as follows:

'Terrible' – 0;
'Bad' – 1;
'OK' – 2;
'Good' – 3;
'Great' – 4.

To encode ordinal data, OrdinalEncoder is used. It just encodes the categories to 0, 1, 2, ... .

OrdinalEncoder is easy to use like any other transformer. The only difficulty is to specify the categories argument correctly.

Let's look at an example of use. We have a dataset (not the penguins dataset) with an 'education' column. Now, let's examine its unique values.


              12345
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')

print(df['education'].unique())

We need to create a list of ordered categorical values, in this case, from 'HS-grad' to 'Doctorate'.


              1234567891011121314
            
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

# Load the data and assign X, y variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')
y = df['income'] # 'income' is a target in this dataset
X = df.drop('income', axis=1)
# Create a list of categories so HS-grad is encoded as 0 and Doctorate as 6
edu_categories = ['HS-grad', 'Some-college', 'Assoc', 'Bachelors', 'Masters', 'Prof-school', 'Doctorate']
# Initialize an OrdinalEncoder instance with the correct categories
ord_enc = OrdinalEncoder(categories=[edu_categories])
# Transform the 'education' column and print it
X['education'] = ord_enc.fit_transform(X[['education']])
print(X['education'])

If you need to transform multiple features using the OrdinalEncoder, it's important to specify the categories for each column. You can do this using the categories argument as shown below:

encoder = OrdinalEncoder(categories=[col1_categories, col2_categories, ...])

1. Which statement best describes the use of the `OrdinalEncoder` for handling categorical data in a dataset?

2. Suppose you have a categorical column named `'Color'`. Would it be appropriate to use the `OrdinalEncoder` to encode its values?

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 5

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Зміст курсу

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling

OrdinalEncoder

The next problem we will solve is categorical data. Recall that there are two types of categorical data.

Ordinal data follows some natural order, while nominal does not. Since there is a natural order, we can encode categories to the numbers in that order.

For example, we would encode the 'rate' column containing 'Terrible', 'Bad', 'OK', 'Good', and 'Great' values as follows:

'Terrible' – 0;
'Bad' – 1;
'OK' – 2;
'Good' – 3;
'Great' – 4.

To encode ordinal data, OrdinalEncoder is used. It just encodes the categories to 0, 1, 2, ... .

OrdinalEncoder is easy to use like any other transformer. The only difficulty is to specify the categories argument correctly.

Let's look at an example of use. We have a dataset (not the penguins dataset) with an 'education' column. Now, let's examine its unique values.


              12345
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')

print(df['education'].unique())

We need to create a list of ordered categorical values, in this case, from 'HS-grad' to 'Doctorate'.


              1234567891011121314
            
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

# Load the data and assign X, y variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')
y = df['income'] # 'income' is a target in this dataset
X = df.drop('income', axis=1)
# Create a list of categories so HS-grad is encoded as 0 and Doctorate as 6
edu_categories = ['HS-grad', 'Some-college', 'Assoc', 'Bachelors', 'Masters', 'Prof-school', 'Doctorate']
# Initialize an OrdinalEncoder instance with the correct categories
ord_enc = OrdinalEncoder(categories=[edu_categories])
# Transform the 'education' column and print it
X['education'] = ord_enc.fit_transform(X[['education']])
print(X['education'])

If you need to transform multiple features using the OrdinalEncoder, it's important to specify the categories for each column. You can do this using the categories argument as shown below:

encoder = OrdinalEncoder(categories=[col1_categories, col2_categories, ...])

1. Which statement best describes the use of the `OrdinalEncoder` for handling categorical data in a dataset?

2. Suppose you have a categorical column named `'Color'`. Would it be appropriate to use the `OrdinalEncoder` to encode its values?

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 5

ML Introduction with scikit-learn

OrdinalEncoder

1. Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

2. Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

ML Introduction with scikit-learn

OrdinalEncoder

1. Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

2. Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

1. Which statement best describes the use of the `OrdinalEncoder` for handling categorical data in a dataset?

2. Suppose you have a categorical column named `'Color'`. Would it be appropriate to use the `OrdinalEncoder` to encode its values?

1. Which statement best describes the use of the `OrdinalEncoder` for handling categorical data in a dataset?

2. Suppose you have a categorical column named `'Color'`. Would it be appropriate to use the `OrdinalEncoder` to encode its values?