Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre KNeighborsClassifier | Section
Machine Learning Foundations with Scikit-Learn

bookKNeighborsClassifier

Glissez pour afficher le menu

When creating the final estimator in a pipeline, the chosen model was KNeighborsClassifier. This chapter provides a brief explanation of how the algorithm operates.

Note
Note

How models work is not a main topic of this course, so it is OK if something seems unclear to you. It is explained in more detail in different courses like Linear Regression with Python or Classification with Python.

k-Nearest Neighbors

k-NN predicts the class of a new instance by looking at its k most similar training samples. KNeighborsClassifier implements this in Scikit-learn.

  1. For a new point, find the k nearest neighbors using feature similarity.
  2. The most common class among them becomes the prediction.

k is a hyperparameter (default = 5). Different values change the model’s behavior, so tuning k is important.

KNeighborsClassifier during .fit()

Unlike many algorithms, KNeighborsClassifier simply stores the training data. Still, calling .fit(X, y) is required so the model knows which dataset to reference during prediction.

KNeighborsClassifier during .predict()

During prediction, the classifier searches for each instance’s k closest neighbors. In the visual example, only two features are shown; adding more features usually improves class separation and prediction accuracy.

Note
Note

In the gifs above, only two features, 'body_mass_g' and 'culmen_depth_mm', are used because visualizing higher-dimensional plots is challenging. Including additional features will likely help the model better separate the green and red data points, enabling the KNeighborsClassifier to make more accurate predictions.

KNeighborsClassifier Coding Example

You can create a classifier, train it, and check its accuracy using .score(). The n_neighbors argument controls k—try both 5 and 1.

12345678910111213
import pandas as pd from sklearn.neighbors import KNeighborsClassifier df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Initialize and train a model knn5 = KNeighborsClassifier().fit(X, y) # Trained 5 neighbors model knn1 = KNeighborsClassifier(n_neighbors=1).fit(X, y) # Trained 1 neighbor model # Print the scores of both models print('5 Neighbors score:',knn5.score(X, y)) print('1 Neighbor score:',knn1.score(X, y))
copy

Using k=1 may yield perfect accuracy, but this is misleading because evaluation was performed on the training set. To measure true performance, always test the model on unseen data.

question mark

How does the KNeighborsClassifier make predictions for a new instance?

Sélectionnez la réponse correcte

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 1. Chapitre 24

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 1. Chapitre 24
some-alt