Choosing the best k value.Choosing the best k value.

As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn has a neat class for that task.

The param_grid parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors, you would use:

The .fit(X, y) method leads the GridSearchCV object to find the best parameters from param_grid and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_ attribute and predict new values using the .predict() method.


  1. Import the GridSearchCV class.
  2. Scale the X using StandardScaler.
  3. Look for the best value of n_neighbors among [3, 9, 18, 27].
  4. Initialize and train a GridSearchCV object with 4 folds of cross-validation.
  5. Print the score of the best model.

Everything was clear?

Section 1. Chapter 7
toggle bottom row