Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn KNeighborsClassifier | Modeling
Introduction to Machine Learning with Python

bookKNeighborsClassifier

When creating the final estimator in a pipeline, the chosen model was KNeighborsClassifier. This chapter provides a brief explanation of how the algorithm operates.

Note
Note

How models work is not a main topic of this course, so it is OK if something seems unclear to you. It is explained in more detail in different courses like Linear Regression with Python or Classification with Python.

k-Nearest Neighbors

k-NN predicts the class of a new instance by looking at its k most similar training samples. KNeighborsClassifier implements this in Scikit-learn.

  1. For a new point, find the k nearest neighbors using feature similarity.
  2. The most common class among them becomes the prediction.

k is a hyperparameter (default = 5). Different values change the model’s behavior, so tuning k is important.

KNeighborsClassifier during .fit()

Unlike many algorithms, KNeighborsClassifier simply stores the training data. Still, calling .fit(X, y) is required so the model knows which dataset to reference during prediction.

KNeighborsClassifier during .predict()

During prediction, the classifier searches for each instance’s k closest neighbors. In the visual example, only two features are shown; adding more features usually improves class separation and prediction accuracy.

Note
Note

In the gifs above, only two features, 'body_mass_g' and 'culmen_depth_mm', are used because visualizing higher-dimensional plots is challenging. Including additional features will likely help the model better separate the green and red data points, enabling the KNeighborsClassifier to make more accurate predictions.

KNeighborsClassifier Coding Example

You can create a classifier, train it, and check its accuracy using .score(). The n_neighbors argument controls kβ€”try both 5 and 1.

12345678910111213
import pandas as pd from sklearn.neighbors import KNeighborsClassifier df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Initialize and train a model knn5 = KNeighborsClassifier().fit(X, y) # Trained 5 neighbors model knn1 = KNeighborsClassifier(n_neighbors=1).fit(X, y) # Trained 1 neighbor model # Print the scores of both models print('5 Neighbors score:',knn5.score(X, y)) print('1 Neighbor score:',knn1.score(X, y))
copy

Using k=1 may yield perfect accuracy, but this is misleading because evaluation was performed on the training set. To measure true performance, always test the model on unseen data.

question mark

How does the KNeighborsClassifier make predictions for a new instance?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 3.13

bookKNeighborsClassifier

Swipe to show menu

When creating the final estimator in a pipeline, the chosen model was KNeighborsClassifier. This chapter provides a brief explanation of how the algorithm operates.

Note
Note

How models work is not a main topic of this course, so it is OK if something seems unclear to you. It is explained in more detail in different courses like Linear Regression with Python or Classification with Python.

k-Nearest Neighbors

k-NN predicts the class of a new instance by looking at its k most similar training samples. KNeighborsClassifier implements this in Scikit-learn.

  1. For a new point, find the k nearest neighbors using feature similarity.
  2. The most common class among them becomes the prediction.

k is a hyperparameter (default = 5). Different values change the model’s behavior, so tuning k is important.

KNeighborsClassifier during .fit()

Unlike many algorithms, KNeighborsClassifier simply stores the training data. Still, calling .fit(X, y) is required so the model knows which dataset to reference during prediction.

KNeighborsClassifier during .predict()

During prediction, the classifier searches for each instance’s k closest neighbors. In the visual example, only two features are shown; adding more features usually improves class separation and prediction accuracy.

Note
Note

In the gifs above, only two features, 'body_mass_g' and 'culmen_depth_mm', are used because visualizing higher-dimensional plots is challenging. Including additional features will likely help the model better separate the green and red data points, enabling the KNeighborsClassifier to make more accurate predictions.

KNeighborsClassifier Coding Example

You can create a classifier, train it, and check its accuracy using .score(). The n_neighbors argument controls kβ€”try both 5 and 1.

12345678910111213
import pandas as pd from sklearn.neighbors import KNeighborsClassifier df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Initialize and train a model knn5 = KNeighborsClassifier().fit(X, y) # Trained 5 neighbors model knn1 = KNeighborsClassifier(n_neighbors=1).fit(X, y) # Trained 1 neighbor model # Print the scores of both models print('5 Neighbors score:',knn5.score(X, y)) print('1 Neighbor score:',knn1.score(X, y))
copy

Using k=1 may yield perfect accuracy, but this is misleading because evaluation was performed on the training set. To measure true performance, always test the model on unseen data.

question mark

How does the KNeighborsClassifier make predictions for a new instance?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2
some-alt