Course Content

Classification with Python

## What is Classification

Classification is a supervised learning task.
Its goal is to predict the class to which the instance belongs based on a set of parameters(features). You need to give many labeled examples of data(called training set) for the computer to learn before it can predict the class of a new instance.

The difference between classification and regression is that regression predicts a continuous numerical value, for example, a price. It can be any real(only positive for a price) number.
In contrast, classification predicts a categorical value, for example, the type of a sweet. There is a finite set of values, and the model tries to classify each instance into one of these categories

Based on the formulation of a problem, there are two types of classification:

• Binary classification: In binary classification, a target is one of two possible outcomes. For example, email: spam/not spam, sweet: cookie/not cookie.
• Multi-class Classification: In Multi-class Classification, there are three or more possible outcomes for a target. For example, email: spam/important/ad/other, sweet: cookie/marshmallow/candy.

For most ML models, you need to encode the target to a number.
For binary classification, outcomes are usually encoded as 0/1 (e.g., 1 – cookie, 0 – not a cookie).
For a multi-class classification, outcomes are usually encoded as 0, 1, 2, ... (e.g., 0 – candy, 1 – cookie, 2 – marshmallow)

Many different models perform classification. In this course, we will discuss the following models:

• k-Nearest Neighbors
• Logistic Regression
• Decision Tree
• Random Forest

Luckily, they are all implemented in the Scikit-learn library and are easy to use.
Why are there so many models? As the No Free Lunch Theorem states, no Machine Learning model is better than any other. Which model will perform best depends on the specific task.

1. Suppose you want to predict the outcome of a sports game. Choose the corresponding relations.
2. Which of the cases correspond to binary classification and which is multiclass classification?

#### Suppose you want to predict the outcome of a sports game. Choose the corresponding relations.

Performance of a team in previous games, each player's rating, last 5 head-to-head results:
_ _ _

Result (win/lose/tie):
_ _ _

Records of all games of past 3 years:
_ _ _

Click or drag`n`drop items and fill in the blanks

Training Set
Target
Features
Models
Algorithm

#### Which of the cases correspond to binary classification and which is multiclass classification?

League game(outcomes win/tie/lose) –
_ _ _

Tournament game(outcomes win/lose) –
_ _ _

Click or drag`n`drop items and fill in the blanks

Binary Classification
Multiclass Classification

Everything was clear?

Section 1. Chapter 1