The **train-test split** is a method used in machine learning to divide a dataset into two parts: a **training set** and a **test set**. 

The **training set** is used to train a model, while the test set is used to evaluate the model's performance. This split is crucial as it allows the model to be tested on unseen data, helping to prevent **overfitting**. 

Overfitting occurs when a model learns the training data too well, performing poorly on **unseen data**. Evaluating the model on a **test set** provides a better indication of how it will perform in real-world scenarios.

Additionally, this approach helps to understand the model's generalization ability and allows for the **tuning of hyperparameters** by comparing performance across different test sets.

In this project, we are going to classify spam emails according to their content.

In this project, we are going to classify spam email according to their content.

Identifying Spam Emails

Train-Test Split

Oplossing