Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Putting It All Together | Modeling
ML Introduction with scikit-learn

bookChallenge: Putting It All Together

In this challenge, apply the full workflow learned in the course β€” from data preprocessing through training to model evaluation.

Task

Swipe to start coding

You are working with a dataset of penguins. Your goal is to build a complete machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model. The pipeline should handle categorical encoding, missing values, feature scaling, and parameter tuning.

  1. Encode the target variable y using the LabelEncoder class.
  2. Split the dataset into training and test sets using train_test_split() with test_size=0.33.
  3. Create a ColumnTransformer named ct that applies a OneHotEncoder to the 'island' and 'sex' columns, leaving all other columns unchanged (remainder='passthrough').
  4. Define a parameter grid param_grid that contains the following values for n_neighbors: [1, 3, 5, 7, 9, 12, 15, 20, 25], and include 'weights' ('uniform', 'distance') and 'p' (1, 2).
  5. Create a GridSearchCV object using KNeighborsClassifier() as the estimator and param_grid as the parameter grid.
  6. Build a pipeline that includes the following steps in order:
    • The ColumnTransformer (ct);
    • A SimpleImputer with the strategy set to 'most_frequent';
    • A StandardScaler for feature scaling;
    • The GridSearchCV object as the final step.
  7. Train the pipeline on the training data (X_train, y_train) using the .fit() method.
  8. Evaluate the model performance by printing the test score using .score(X_test, y_test).
  9. Generate predictions on the test data and print the first 5 decoded class names using label_enc.inverse_transform().
  10. Print the best estimator found by GridSearchCV using the .best_estimator_ attribute.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 10
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

Awesome!

Completion rate improved to 3.13

bookChallenge: Putting It All Together

Swipe to show menu

In this challenge, apply the full workflow learned in the course β€” from data preprocessing through training to model evaluation.

Task

Swipe to start coding

You are working with a dataset of penguins. Your goal is to build a complete machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model. The pipeline should handle categorical encoding, missing values, feature scaling, and parameter tuning.

  1. Encode the target variable y using the LabelEncoder class.
  2. Split the dataset into training and test sets using train_test_split() with test_size=0.33.
  3. Create a ColumnTransformer named ct that applies a OneHotEncoder to the 'island' and 'sex' columns, leaving all other columns unchanged (remainder='passthrough').
  4. Define a parameter grid param_grid that contains the following values for n_neighbors: [1, 3, 5, 7, 9, 12, 15, 20, 25], and include 'weights' ('uniform', 'distance') and 'p' (1, 2).
  5. Create a GridSearchCV object using KNeighborsClassifier() as the estimator and param_grid as the parameter grid.
  6. Build a pipeline that includes the following steps in order:
    • The ColumnTransformer (ct);
    • A SimpleImputer with the strategy set to 'most_frequent';
    • A StandardScaler for feature scaling;
    • The GridSearchCV object as the final step.
  7. Train the pipeline on the training data (X_train, y_train) using the .fit() method.
  8. Evaluate the model performance by printing the test score using .score(X_test, y_test).
  9. Generate predictions on the test data and print the first 5 decoded class names using label_enc.inverse_transform().
  10. Print the best estimator found by GridSearchCV using the .best_estimator_ attribute.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 10
single

single

some-alt