Challenge: Evaluating the Model with Cross-Validation
In this challenge, you will build and evaluate a model using both train-test evaluation and cross-validation. The data is an already preprocessed penguins dataset.
Here are some of the functions you will use:
Tâche
Swipe to start coding
Your task is to create a 4-nearest neighbors classifier and first evaluate its performance using the cross-validation score. Then split the data into train-test sets, train the model on the training set, and evaluate its performance on the test set.
- Initialize a
KNeighborsClassifier
with 4 neighbors. - Calculate the cross-validation scores of this model with the number of folds set to 3. You can pass an untrained model to a
cross_val_score()
function. - Use a suitable function to split
X, y
. - Train the model using the training set.
- Evaluate the model using the test set.
Solution
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, cross_val_score
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Initialize a model
model = KNeighborsClassifier(n_neighbors=4)
# Calculate and print the mean of cross validation scores
scores = cross_val_score(model, X, y, cv=3)
print('Cross-val score:', scores.mean())
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# Train a model
model.fit(X_train, y_train)
# Print the score using the test set
print('Train-test score:', model.score(X_test, y_test))
Tout était clair ?
Merci pour vos commentaires !
Section 4. Chapitre 5
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, cross_val_score
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Initialize a model
model = ___
# Calculate and print the mean of cross validation scores
scores = ___(___, X, y, cv=___)
print('Cross-val score:', scores.mean())
# Train-test split
X_train, X_test, y_train, y_test = ___(X, y, test_size=0.33)
# Train a model
___
# Print the score using the test set
print('Train-test score:', ___)
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion