Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn The Flaw of GridSearchCV | Modeling
Introduction to Machine Learning with Python

bookThe Flaw of GridSearchCV

Before using GridSearchCV, note that KNeighborsClassifier has more hyperparameters than n_neighbors. Two important ones are weights and p.

Weights

By default, the classifier uses weights='uniform', meaning all k neighbors vote equally. Setting weights='distance' gives closer neighbors more influence, often improving predictions when nearby points are more relevant.

P

The p parameter controls the distance metric:

  • p=1: Manhattan distance;
  • p=2: Euclidean distance.

A p parameter can take any positive integer. There are many different distances, but they are harder to visualize than p=1 or p=2.

Note
Note

Do not worry if the details of weights or p are unclear. They are introduced simply to show that there is more than one hyperparameter that can influence the model’s predictions. Treat them as examples of hyperparameters that can be tuned.

Previously, only n_neighbors was tuned. To search over all three hyperparameters, use:

param_grid = {
    'n_neighbors': [1, 3, 5, 7],
    'weights': ['distance', 'uniform'],
    'p': [1, 2]
}

GridSearchCV tries all the possible combinations to find the best, so it will try all of those:

A larger grid like:

param_grid = {
    'n_neighbors': [...],
    'weights': ['distance', 'uniform'],
    'p': [1, 2, 3, 4, 5]
}

creates 100 combinations. With 5-fold cross-validation, the model is trained 500 times. This is fine for small datasets, but for bigger ones it becomes too slow.

To reduce computation time, RandomizedSearchCV tests only a random subset of combinations, usually finding strong results much faster than a full grid search.

question mark

The main problem of GridSearchCV is that it tries all possible combinations (of what's specified in param_grid) which may take a lot of time. Is this statement correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 7

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how RandomizedSearchCV selects the combinations to test?

What are some best practices for choosing the range of hyperparameters?

How do I interpret the results from GridSearchCV or RandomizedSearchCV?

bookThe Flaw of GridSearchCV

Swipe to show menu

Before using GridSearchCV, note that KNeighborsClassifier has more hyperparameters than n_neighbors. Two important ones are weights and p.

Weights

By default, the classifier uses weights='uniform', meaning all k neighbors vote equally. Setting weights='distance' gives closer neighbors more influence, often improving predictions when nearby points are more relevant.

P

The p parameter controls the distance metric:

  • p=1: Manhattan distance;
  • p=2: Euclidean distance.

A p parameter can take any positive integer. There are many different distances, but they are harder to visualize than p=1 or p=2.

Note
Note

Do not worry if the details of weights or p are unclear. They are introduced simply to show that there is more than one hyperparameter that can influence the model’s predictions. Treat them as examples of hyperparameters that can be tuned.

Previously, only n_neighbors was tuned. To search over all three hyperparameters, use:

param_grid = {
    'n_neighbors': [1, 3, 5, 7],
    'weights': ['distance', 'uniform'],
    'p': [1, 2]
}

GridSearchCV tries all the possible combinations to find the best, so it will try all of those:

A larger grid like:

param_grid = {
    'n_neighbors': [...],
    'weights': ['distance', 'uniform'],
    'p': [1, 2, 3, 4, 5]
}

creates 100 combinations. With 5-fold cross-validation, the model is trained 500 times. This is fine for small datasets, but for bigger ones it becomes too slow.

To reduce computation time, RandomizedSearchCV tests only a random subset of combinations, usually finding strong results much faster than a full grid search.

question mark

The main problem of GridSearchCV is that it tries all possible combinations (of what's specified in param_grid) which may take a lot of time. Is this statement correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 7
some-alt