Summary  
This chapter explains how supervised learning tasks are divided into classification—predicting discrete labels—and regression—predicting continuous values—based on the nature of the target variable.  

General domain of usage  
Spam detection

Supervised machine learning problems are typically divided into two main categories: **classification** and **regression**. Understanding the distinction between these two types is essential because it determines how you prepare your data, select your algorithms, and interpret your results. In both cases, your dataset consists of **features** (the input variables) and **labels** (the target variable you want to predict), as introduced earlier. However, the nature of the label differs between classification and regression tasks.



**Classification** is about predicting which category or class an observation belongs to. The label in a classification dataset is discrete, meaning it takes on a limited set of possible values. For example, predicting whether an email is spam or not spam involves two classes: "spam" and "not spam." The output is categorical.

**Regression**, on the other hand, is about predicting a continuous value. Here, the label is a real number, and the goal is to estimate this number as accurately as possible. For instance, predicting the price of a house based on its features (such as size and location) is a regression problem because the output can be any value within a range.

The main difference between classification and regression lies in the type of label you are trying to predict: discrete categories for classification, and continuous values for regression.

You encounter classification and regression problems in many real-world scenarios. **Classification** is well suited for tasks such as:
- Spam detection: classify emails as "spam" or "not spam";
- Medical diagnosis: determine if a tumor is "benign" or "malignant";
- Image recognition: identify objects in photos as "cat," "dog," or "car".

**Regression** is used when the prediction is a continuous value, such as:
- Predicting house prices based on location and size;
- Forecasting stock prices using historical data;
- Estimating the temperature for the next day given weather conditions.

Choosing the right approach depends on the type of label in your dataset and the prediction you want to make.

Which statement correctly describes a classification problem?

Introduces the fundamental concepts of supervised machine learning using Python. Covers key ideas such as classification and regression, and explains essential terminology including features, labels, and training and testing data.

This section introduces the foundational concepts of machine learning, focusing on supervised learning, essential terminology, and the basic workflow of building machine learning models.

Classification vs Regression