Challenge: Choosing the Best K Value.
As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn
has a neat class for that task.
The param_grid
parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors
, you would use:
pythonparam_grid = {'n_neighbors': range(1, 100)}
The .fit(X, y)
method leads the GridSearchCV
object to find the best parameters from param_grid
and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_
attribute and predict new values using the .predict()
method.
Swipe to start coding
- Import the
GridSearchCV
class. - Scale the
X
usingStandardScaler
. - Look for the best value of
n_neighbors
among[3, 9, 18, 27]
. - Initialize and train a
GridSearchCV
object with 4 folds of cross-validation. - Print the score of the best model.
Løsning
Tak for dine kommentarer!
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat