Course Content

# Classification with Python

5. Comparing Models

Classification with Python

## What is Classification

**Classification** is a supervised learning task.

Its goal is to predict the class to which the instance belongs based on a set of parameters(**features**). You need to give many labeled examples of data(called **training set**) for the computer to learn before it can predict the class of a new instance.

The difference between classification and regression is that regression predicts a continuous numerical value, for example, a price. It can be any real(only positive for a price) number.

In contrast, classification predicts a categorical value, for example, the type of a sweet. There is a finite set of values, and the model tries to classify each instance into one of these categories

Based on the formulation of a problem, there are two types of classification:

- Binary classification: In binary classification, a target is one of two possible outcomes. For example, email: spam/not spam, sweet: cookie/not cookie.
- Multi-class Classification: In Multi-class Classification, there are three or more possible outcomes for a target. For example, email: spam/important/ad/other, sweet: cookie/marshmallow/candy.

For most ML models, you need to encode the target to a number.

For binary classification, outcomes are usually encoded as 0/1 (e.g., 1 – cookie, 0 – not a cookie).

For a multi-class classification, outcomes are usually encoded as 0, 1, 2, ... (e.g., 0 – candy, 1 – cookie, 2 – marshmallow)

Many different models perform classification. In this course, we will discuss the following models:

- k-Nearest Neighbors
- Logistic Regression
- Decision Tree
- Random Forest

Luckily, they are all implemented in the Scikit-learn library and are easy to use.

Why are there so many models? As the **No Free Lunch Theorem** states, no Machine Learning model is better than any other. Which model will perform best depends on the specific task.

Everything was clear?