Challenge: Putting It All Together
In this challenge, apply the full workflow learned in the course β from data preprocessing through training to model evaluation.
Swipe to start coding
You are working with a dataset of penguins. Your goal is to build a complete machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model. The pipeline should handle categorical encoding, missing values, feature scaling, and parameter tuning.
- Encode the target variable
yusing theLabelEncoderclass. - Split the dataset into training and test sets using
train_test_split()withtest_size=0.33. - Create a
ColumnTransformernamedctthat applies aOneHotEncoderto the'island'and'sex'columns, leaving all other columns unchanged (remainder='passthrough'). - Define a parameter grid
param_gridthat contains the following values forn_neighbors:[1, 3, 5, 7, 9, 12, 15, 20, 25], and include'weights'('uniform','distance') and'p'(1,2). - Create a
GridSearchCVobject usingKNeighborsClassifier()as the estimator andparam_gridas the parameter grid. - Build a pipeline that includes the following steps in order:
- The
ColumnTransformer(ct); - A
SimpleImputerwith the strategy set to'most_frequent'; - A
StandardScalerfor feature scaling; - The
GridSearchCVobject as the final step.
- The
- Train the pipeline on the training data (
X_train,y_train) using the.fit()method. - Evaluate the model performance by printing the test score using
.score(X_test, y_test). - Generate predictions on the test data and print the first 5 decoded class names using
label_enc.inverse_transform(). - Print the best estimator found by
GridSearchCVusing the.best_estimator_attribute.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.13
Challenge: Putting It All Together
Swipe to show menu
In this challenge, apply the full workflow learned in the course β from data preprocessing through training to model evaluation.
Swipe to start coding
You are working with a dataset of penguins. Your goal is to build a complete machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model. The pipeline should handle categorical encoding, missing values, feature scaling, and parameter tuning.
- Encode the target variable
yusing theLabelEncoderclass. - Split the dataset into training and test sets using
train_test_split()withtest_size=0.33. - Create a
ColumnTransformernamedctthat applies aOneHotEncoderto the'island'and'sex'columns, leaving all other columns unchanged (remainder='passthrough'). - Define a parameter grid
param_gridthat contains the following values forn_neighbors:[1, 3, 5, 7, 9, 12, 15, 20, 25], and include'weights'('uniform','distance') and'p'(1,2). - Create a
GridSearchCVobject usingKNeighborsClassifier()as the estimator andparam_gridas the parameter grid. - Build a pipeline that includes the following steps in order:
- The
ColumnTransformer(ct); - A
SimpleImputerwith the strategy set to'most_frequent'; - A
StandardScalerfor feature scaling; - The
GridSearchCVobject as the final step.
- The
- Train the pipeline on the training data (
X_train,y_train) using the.fit()method. - Evaluate the model performance by printing the test score using
.score(X_test, y_test). - Generate predictions on the test data and print the first 5 decoded class names using
label_enc.inverse_transform(). - Print the best estimator found by
GridSearchCVusing the.best_estimator_attribute.
Solution
Thanks for your feedback!
single