Challenge: Choosing the Best K Value
As shown in the previous chapters, the model's predictions can vary depending on the value of k (the number of neighbors). When building a k-NN model, it's important to choose the k value that gives the best performance.
A common approach is to use cross-validation to evaluate model performance. You can run a loop and calculate cross-validation scores for a range of k values, then select the one with the highest score. This is the most widely used method.
To perform this, sklearn offers a convenient tool: the GridSearchCV class.
Constructor:
GridSearchCV(estimator, param_grid, scoring, cv = 5)estimator— the model object;param_grid— dictionary with parameter values to search through;scoring— the metric used for cross-validation score;cv— the number of folds (5 by default);
Methods:
fit(X, y)— train the models using X, y;predict(X)— predict the class for X;score(X, y)— returns the accuracy for the X, y set;
Attributes:
best_estimator_— object of a model with highest score;best_score_— the score of thebest_estimator_.
The param_grid parameter takes a dictionary where the keys are parameter names and the values are lists of options to try. For example, to test values from 1 to 99 for n_neighbors, you can write:
param_grid = {'n_neighbors': range(1, 100)}
Calling the .fit(X, y) method on the GridSearchCV object will search through the parameter grid to find the best parameters and then re-train the model on the entire dataset using those best parameters.
You can access the best score using the .best_score_ attribute and make predictions with the optimized model using the .predict() method. Similarly, you can retrieve the best model itself using the .best_estimator_ attribute.
Swipe to start coding
You are given the Star Wars ratings dataset stored as a DataFrame in the df variable.
- Initialize
param_gridas a dictionary containing then_neighborsparameter with the values[3, 9, 18, 27]. - Create a
GridSearchCVobject usingparam_gridwith 4-fold cross-validation, train it, and store it in thegrid_searchvariable. - Retrieve the best model from
grid_searchand store it in thebest_modelvariable. - Retrieve the score of the best model and store it in the
best_scorevariable.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.33
Challenge: Choosing the Best K Value
Swipe to show menu
As shown in the previous chapters, the model's predictions can vary depending on the value of k (the number of neighbors). When building a k-NN model, it's important to choose the k value that gives the best performance.
A common approach is to use cross-validation to evaluate model performance. You can run a loop and calculate cross-validation scores for a range of k values, then select the one with the highest score. This is the most widely used method.
To perform this, sklearn offers a convenient tool: the GridSearchCV class.
Constructor:
GridSearchCV(estimator, param_grid, scoring, cv = 5)estimator— the model object;param_grid— dictionary with parameter values to search through;scoring— the metric used for cross-validation score;cv— the number of folds (5 by default);
Methods:
fit(X, y)— train the models using X, y;predict(X)— predict the class for X;score(X, y)— returns the accuracy for the X, y set;
Attributes:
best_estimator_— object of a model with highest score;best_score_— the score of thebest_estimator_.
The param_grid parameter takes a dictionary where the keys are parameter names and the values are lists of options to try. For example, to test values from 1 to 99 for n_neighbors, you can write:
param_grid = {'n_neighbors': range(1, 100)}
Calling the .fit(X, y) method on the GridSearchCV object will search through the parameter grid to find the best parameters and then re-train the model on the entire dataset using those best parameters.
You can access the best score using the .best_score_ attribute and make predictions with the optimized model using the .predict() method. Similarly, you can retrieve the best model itself using the .best_estimator_ attribute.
Swipe to start coding
You are given the Star Wars ratings dataset stored as a DataFrame in the df variable.
- Initialize
param_gridas a dictionary containing then_neighborsparameter with the values[3, 9, 18, 27]. - Create a
GridSearchCVobject usingparam_gridwith 4-fold cross-validation, train it, and store it in thegrid_searchvariable. - Retrieve the best model from
grid_searchand store it in thebest_modelvariable. - Retrieve the score of the best model and store it in the
best_scorevariable.
Solution
Thanks for your feedback!
single