Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Evaluating the Model with Cross-Validation | Modeling
ML Introduction with scikit-learn

book
Challenge: Evaluating the Model with Cross-Validation

In this challenge, you will build and evaluate a model using both train-test evaluation and cross-validation. The data is an already preprocessed penguins dataset.

Here are some of the functions you will use:

Opgave

Swipe to start coding

Your task is to create a 4-nearest neighbors classifier and first evaluate its performance using the cross-validation score. Then split the data into train-test sets, train the model on the training set, and evaluate its performance on the test set.

  1. Initialize a KNeighborsClassifier with 4 neighbors.
  2. Calculate the cross-validation scores of this model with the number of folds set to 3. You can pass an untrained model to a cross_val_score() function.
  3. Use a suitable function to split X, y.
  4. Train the model using the training set.
  5. Evaluate the model using the test set.

Løsning

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Initialize a model
model = KNeighborsClassifier(n_neighbors=4)
# Calculate and print the mean of cross validation scores
scores = cross_val_score(model, X, y, cv=3)
print('Cross-val score:', scores.mean())
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# Train a model
model.fit(X_train, y_train)
# Print the score using the test set
print('Train-test score:', model.score(X_test, y_test))

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 4. Kapitel 5
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Initialize a model
model = ___
# Calculate and print the mean of cross validation scores
scores = ___(___, X, y, cv=___)
print('Cross-val score:', scores.mean())
# Train-test split
X_train, X_test, y_train, y_test = ___(X, y, test_size=0.33)
# Train a model
___
# Print the score using the test set
print('Train-test score:', ___)
toggle bottom row
some-alt