Challenge: Choosing the Best K Value.
As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn has a neat class for that task.
The param_grid parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors, you would use:
param_grid = {'n_neighbors': range(1, 100)}
The .fit(X, y) method leads the GridSearchCV object to find the best parameters from param_grid and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_ attribute and predict new values using the .predict() method.
Swipe to start coding
- Import the
GridSearchCVclass. - Scale the
XusingStandardScaler. - Look for the best value of
n_neighborsamong[3, 9, 18, 27]. - Initialize and train a
GridSearchCVobject with 4 folds of cross-validation. - Print the score of the best model.
Solución
¡Gracias por tus comentarios!
single
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Awesome!
Completion rate improved to 3.57
Challenge: Choosing the Best K Value.
Desliza para mostrar el menú
As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn has a neat class for that task.
The param_grid parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors, you would use:
param_grid = {'n_neighbors': range(1, 100)}
The .fit(X, y) method leads the GridSearchCV object to find the best parameters from param_grid and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_ attribute and predict new values using the .predict() method.
Swipe to start coding
- Import the
GridSearchCVclass. - Scale the
XusingStandardScaler. - Look for the best value of
n_neighborsamong[3, 9, 18, 27]. - Initialize and train a
GridSearchCVobject with 4 folds of cross-validation. - Print the score of the best model.
Solución
¡Gracias por tus comentarios!
single