Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära KNeighborsClassifier | Section
Machine Learning Foundations with Scikit-Learn

bookKNeighborsClassifier

Svep för att visa menyn

When creating the final estimator in a pipeline, the chosen model was KNeighborsClassifier. This chapter provides a brief explanation of how the algorithm operates.

Note
Note

How models work is not a main topic of this course, so it is OK if something seems unclear to you. It is explained in more detail in different courses like Linear Regression with Python or Classification with Python.

k-Nearest Neighbors

k-NN predicts the class of a new instance by looking at its k most similar training samples. KNeighborsClassifier implements this in Scikit-learn.

  1. For a new point, find the k nearest neighbors using feature similarity.
  2. The most common class among them becomes the prediction.

k is a hyperparameter (default = 5). Different values change the model’s behavior, so tuning k is important.

KNeighborsClassifier during .fit()

Unlike many algorithms, KNeighborsClassifier simply stores the training data. Still, calling .fit(X, y) is required so the model knows which dataset to reference during prediction.

KNeighborsClassifier during .predict()

During prediction, the classifier searches for each instance’s k closest neighbors. In the visual example, only two features are shown; adding more features usually improves class separation and prediction accuracy.

Note
Note

In the gifs above, only two features, 'body_mass_g' and 'culmen_depth_mm', are used because visualizing higher-dimensional plots is challenging. Including additional features will likely help the model better separate the green and red data points, enabling the KNeighborsClassifier to make more accurate predictions.

KNeighborsClassifier Coding Example

You can create a classifier, train it, and check its accuracy using .score(). The n_neighbors argument controls k—try both 5 and 1.

12345678910111213
import pandas as pd from sklearn.neighbors import KNeighborsClassifier df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Initialize and train a model knn5 = KNeighborsClassifier().fit(X, y) # Trained 5 neighbors model knn1 = KNeighborsClassifier(n_neighbors=1).fit(X, y) # Trained 1 neighbor model # Print the scores of both models print('5 Neighbors score:',knn5.score(X, y)) print('1 Neighbor score:',knn1.score(X, y))
copy

Using k=1 may yield perfect accuracy, but this is misleading because evaluation was performed on the training set. To measure true performance, always test the model on unseen data.

question mark

How does the KNeighborsClassifier make predictions for a new instance?

Vänligen välj det korrekta svaret

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 24

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Avsnitt 1. Kapitel 24
some-alt