Challenge: Putting It All Together
In this challenge, apply the full workflow learned in the course β from data preprocessing through training to model evaluation.
Swipe to start coding
You are given a dataset of penguins. Your goal is to build a machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model, while properly handling encoding, missing values, and parameter optimization.
- Encode the target variable using
LabelEncoder. - Split the dataset into training and test sets with
test_size=0.33. - Create a ColumnTransformer (
ct) that encodes only the'island'and'sex'columns using a suitable encoder for nominal data (OneHotEncoder) and keeps the other columns untouched. - Define a parameter grid (
param_grid) that includes the following values forn_neighbors:[1, 3, 5, 7, 9, 12, 15, 20, 25]. - Create a
GridSearchCVobject withKNeighborsClassifieras the base estimator andparam_gridas its parameters. - Build a pipeline consisting of:
- the
ColumnTransformer(ct); - a
SimpleImputer(strategy ='most_frequent'); - a
StandardScaler; - and the
GridSearchCVas the final step.
- the
- Train the pipeline using the training data (
X_train,y_train). - Evaluate the model on the test data by printing its
.score(X_test, y_test). - Predict on the test set and print the first 5 decoded predictions using
label_enc.inverse_transform(). - Finally, print the best estimator found by
GridSearchCV.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.13
Challenge: Putting It All Together
Swipe to show menu
In this challenge, apply the full workflow learned in the course β from data preprocessing through training to model evaluation.
Swipe to start coding
You are given a dataset of penguins. Your goal is to build a machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model, while properly handling encoding, missing values, and parameter optimization.
- Encode the target variable using
LabelEncoder. - Split the dataset into training and test sets with
test_size=0.33. - Create a ColumnTransformer (
ct) that encodes only the'island'and'sex'columns using a suitable encoder for nominal data (OneHotEncoder) and keeps the other columns untouched. - Define a parameter grid (
param_grid) that includes the following values forn_neighbors:[1, 3, 5, 7, 9, 12, 15, 20, 25]. - Create a
GridSearchCVobject withKNeighborsClassifieras the base estimator andparam_gridas its parameters. - Build a pipeline consisting of:
- the
ColumnTransformer(ct); - a
SimpleImputer(strategy ='most_frequent'); - a
StandardScaler; - and the
GridSearchCVas the final step.
- the
- Train the pipeline using the training data (
X_train,y_train). - Evaluate the model on the test data by printing its
.score(X_test, y_test). - Predict on the test set and print the first 5 decoded predictions using
label_enc.inverse_transform(). - Finally, print the best estimator found by
GridSearchCV.
Solution
Thanks for your feedback!
single