Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Putting It All Together | Modeling
ML Introduction with scikit-learn

bookChallenge: Putting It All Together

In this challenge, apply the full workflow learned in the course β€” from data preprocessing through training to model evaluation.

Task

Swipe to start coding

You are given a dataset of penguins. Your goal is to build a machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model, while properly handling encoding, missing values, and parameter optimization.

  1. Encode the target variable using LabelEncoder.
  2. Split the dataset into training and test sets with test_size=0.33.
  3. Create a ColumnTransformer (ct) that encodes only the 'island' and 'sex' columns using a suitable encoder for nominal data (OneHotEncoder) and keeps the other columns untouched.
  4. Define a parameter grid (param_grid) that includes the following values for n_neighbors: [1, 3, 5, 7, 9, 12, 15, 20, 25].
  5. Create a GridSearchCV object with KNeighborsClassifier as the base estimator and param_grid as its parameters.
  6. Build a pipeline consisting of:
    • the ColumnTransformer (ct);
    • a SimpleImputer (strategy = 'most_frequent');
    • a StandardScaler;
    • and the GridSearchCV as the final step.
  7. Train the pipeline using the training data (X_train, y_train).
  8. Evaluate the model on the test data by printing its .score(X_test, y_test).
  9. Predict on the test set and print the first 5 decoded predictions using label_enc.inverse_transform().
  10. Finally, print the best estimator found by GridSearchCV.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 10
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

Awesome!

Completion rate improved to 3.13

bookChallenge: Putting It All Together

Swipe to show menu

In this challenge, apply the full workflow learned in the course β€” from data preprocessing through training to model evaluation.

Task

Swipe to start coding

You are given a dataset of penguins. Your goal is to build a machine learning pipeline that classifies penguin species using a K-Nearest Neighbors (KNN) model, while properly handling encoding, missing values, and parameter optimization.

  1. Encode the target variable using LabelEncoder.
  2. Split the dataset into training and test sets with test_size=0.33.
  3. Create a ColumnTransformer (ct) that encodes only the 'island' and 'sex' columns using a suitable encoder for nominal data (OneHotEncoder) and keeps the other columns untouched.
  4. Define a parameter grid (param_grid) that includes the following values for n_neighbors: [1, 3, 5, 7, 9, 12, 15, 20, 25].
  5. Create a GridSearchCV object with KNeighborsClassifier as the base estimator and param_grid as its parameters.
  6. Build a pipeline consisting of:
    • the ColumnTransformer (ct);
    • a SimpleImputer (strategy = 'most_frequent');
    • a StandardScaler;
    • and the GridSearchCV as the final step.
  7. Train the pipeline using the training data (X_train, y_train).
  8. Evaluate the model on the test data by printing its .score(X_test, y_test).
  9. Predict on the test set and print the first 5 decoded predictions using label_enc.inverse_transform().
  10. Finally, print the best estimator found by GridSearchCV.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 10
single

single

some-alt