Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
What is Classification | k-NN Classifier
Classification with Python
course content

Course Content

Classification with Python

Classification with Python

1. k-NN Classifier
2. Logistic Regression
3. Decision Tree
4. Random Forest
5. Comparing Models

bookWhat is Classification

Classification is a supervised learning task.
Its goal is to predict the class to which the instance belongs based on a set of parameters(features). You need to give many labeled examples of data(called training set) for the computer to learn before it can predict the class of a new instance.

The difference between classification and regression is that regression predicts a continuous numerical value, for example, a price. It can be any real(only positive for a price) number.
In contrast, classification predicts a categorical value, for example, the type of a sweet. There is a finite set of values, and the model tries to classify each instance into one of these categories

Based on the formulation of a problem, there are two types of classification:

  • Binary classification: In binary classification, a target is one of two possible outcomes. For example, email: spam/not spam, sweet: cookie/not cookie;
  • Multi-class Classification: In Multi-class Classification, there are three or more possible outcomes for a target. For example, email: spam/important/ad/other, sweet: cookie/marshmallow/candy.

For most ML models, you need to encode the target to a number.
For binary classification, outcomes are usually encoded as 0/1 (e.g., 1 – cookie, 0 – not a cookie).
For a multi-class classification, outcomes are usually encoded as 0, 1, 2, ... (e.g., 0 – candy, 1 – cookie, 2 – marshmallow)

Many different models perform classification. In this course, we will discuss the following models:

  • k-Nearest Neighbors;
  • Logistic Regression;
  • Decision Tree;
  • Random Forest.

Luckily, they are all implemented in the Scikit-learn library and are easy to use.
Why are there so many models? As the No Free Lunch Theorem states, no Machine Learning model is better than any other. Which model will perform best depends on the specific task.

1. Suppose you want to predict the outcome of a sports game. Choose the corresponding relations.
2. Which of the cases correspond to binary classification and which is multiclass classification?
question-icon

Suppose you want to predict the outcome of a sports game. Choose the corresponding relations.

Performance of a team in previous games, each player's rating, last 5 head-to-head results:
Result (win/lose/tie):

Records of all games of past 3 years:

Click or drag`n`drop items and fill in the blanks

question-icon

Which of the cases correspond to binary classification and which is multiclass classification?

League game(outcomes win/tie/lose) –
Tournament game(outcomes win/lose) –

Click or drag`n`drop items and fill in the blanks

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 1
some-alt