Learn Model Evaluation | Neural Network from Scratch

Splitting the Data

After training a neural network, it is essential to evaluate how well it performs on unseen data. This evaluation helps determine whether the model has learned meaningful patterns or has merely memorized the training examples. To do this, the dataset is divided into two parts:

Training set — used to train the neural network by adjusting its weights and biases through backpropagation;
Test set — used after training to evaluate how well the model generalizes to new, unseen data.

A common split is 80% for training and 20% for testing, although this ratio may vary depending on the dataset’s size and complexity.

The data split is typically performed using the train_test_split() function from the sklearn.model_selection module:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=...)

The test_size parameter determines the proportion of data reserved for testing. For instance, setting test_size=0.1 means that 10% of the data will be used for testing, while 90% will be used for training.

If the model performs well on the training set but poorly on the test set, it may be overfitting — learning patterns too specific to the training data instead of generalizing to new examples. The goal is to achieve strong performance on both datasets, ensuring that the model generalizes well.

Once the data is split and the model is trained, performance should be measured using appropriate evaluation metrics, which depend on the specific classification task.

Classification Metrics

For classification problems, several key metrics can be used to evaluate the model's predictions:

Accuracy;
Precision;
Recall;
F1-score.

Since a perceptron performs binary classification, creating a confusion matrix will help you understand these metrics better.

Definition

A confusion matrix is a table that summarizes the model's classification performance by comparing the predicted labels with the actual labels. It provides insights into the number of correct and incorrect predictions for each class (1 and 0).

Accuracy measures the proportion of correctly classified samples out of the total. If a model correctly classifies 90 out of 100 images, its accuracy is 90%.

\text{accuracy} = \frac {\text{correct}} {\text{all}} = \frac {TP + TN} {TP + TN + FP + FN}

While accuracy is useful, it may not always provide a full picture—especially for imbalanced datasets. For example, in a dataset where 95% of samples belong to one class, a model could achieve 95% accuracy just by always predicting the majority class—without actually learning anything useful. In such cases, precision, recall, or the F1-score might be more informative.

Precision is the percentage of correctly predicted positive cases out of all predicted positives. This metric is particularly useful when false positives are costly, such as in spam detection or fraud detection.

\text{precision} = \frac {\text{correct positive}} {\text{predicted positive}} = \frac {TP} {TP + FP}

Recall (sensitivity) measures how many of the actual positive cases the model correctly identifies. A high recall is essential in scenarios where false negatives must be minimized, such as medical diagnoses.

\text{recall} = \frac {\text{correct positive}} {\text{all positive}} = \frac {TP} {TP + FN}

F1-score is the harmonic mean of precision and recall, providing a balanced measure when both false positives and false negatives are important. This is useful when the dataset is imbalanced, meaning one class appears significantly more than the other.

\text{F1} = \frac {2 \times \text{precision} \times \text{recall}} {\text{precision} + \text{recall}}

1. What is the main purpose of splitting your dataset into training and test sets?

2. Why might F1-score be preferred over accuracy on an imbalanced dataset?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 11

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 4

Swipe to show menu