Course Content
Classification with Python
Classification with Python
What is Classification
Classification is a supervised learning task.
Its goal is to predict the class to which the instance belongs based on a set of parameters(features). You need to give many labeled examples of data(called training set) for the computer to learn before it can predict the class of a new instance.
The difference between classification and regression is that regression predicts a continuous numerical value, for example, a price. It can be any real(only positive for a price) number.
In contrast, classification predicts a categorical value, for example, the type of a sweet. There is a finite set of values, and the model tries to classify each instance into one of these categories
Based on the formulation of a problem, there are two types of classification:
- Binary classification: In binary classification, a target is one of two possible outcomes. For example, email: spam/not spam, sweet: cookie/not cookie;
- Multi-class Classification: In Multi-class Classification, there are three or more possible outcomes for a target. For example, email: spam/important/ad/other, sweet: cookie/marshmallow/candy.
For most ML models, you need to encode the target to a number.
For binary classification, outcomes are usually encoded as 0/1 (e.g., 1 – cookie, 0 – not a cookie).
For a multi-class classification, outcomes are usually encoded as 0, 1, 2, ... (e.g., 0 – candy, 1 – cookie, 2 – marshmallow)
Many different models perform classification. In this course, we will discuss the following models:
- k-Nearest Neighbors;
- Logistic Regression;
- Decision Tree;
- Random Forest.
Luckily, they are all implemented in the Scikit-learn library and are easy to use.
Why are there so many models? As the No Free Lunch Theorem states, no Machine Learning model is better than any other. Which model will perform best depends on the specific task.
Thanks for your feedback!