Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Backward Propagation | Neural Network from Scratch
Introduction to Neural Networks
course content

Course Content

Introduction to Neural Networks

Introduction to Neural Networks

1. Concept of Neural Network
2. Neural Network from Scratch
3. Conclusion

book
Backward Propagation

Warning

Backpropagation is the most confusing part of neural network training. At its core, it uses the gradient descent algorithm, which requires a good understanding of calculus.

Backpropagation Structure

We can split backpropagation algorithm into several steps:

Forward Propagation:

At this step we pass our inputs through the perceptron to store outputs (Neuron.output) of every neuron. This part we have already implemented in previous chapter.

Error Computing:

In this phase, we determine the individual error for each neuron. This error indicates the difference between the neuron's output and the desired output.

For neurons in the output layer, this is straightforward: when given a specific input, the error represents the difference between the neural network's prediction and the actual target value.

For neurons in the hidden layers, the error measures the variation between their current output and the expected input for the subsequent layer.

Calculating the Gradient (Delta):

At this stage, we calculate the degree and direction of each neuron's deviation. We achieve this by multiplying the neuron's error with the derivative of its activation function (in this case, sigmoid) based on its output.

This computation should be executed concurrently with error calculation, as the current layer's gradient (delta) is essential for determining the error in the preceding layer. It also causes this process to be done in order from output layer to input layer (backward direction).

Modifying Weights and Biases (Taking a Step in Gradient Descent):

The last step of the backpropagation process involves updating the neurons' weights and biases according to their respective deltas.

Note

Error computing and calculating the gradient should progress in reverse order, moving from the output layer towards the input layer.

Learning Rate

Another crucial aspect of model training is the learning rate. As an integral component of the gradient descent algorithm, the learning rate can be visualized as the pace of training.

A higher learning rate accelerates the training process; however, an excessively high rate might cause the neural network to overlook valuable insights and patterns within the data.

Note

The learning rate is a floating point value between 0 and 1 and its used on the last step of the backpropagation algorithm to reduce the adjustments applied to the weights and biases. Selecting an optimal learning rate involves various methods known as hyperparameter tuning.

Epochs

Every time our perceptron processes the entire dataset, we refer to it as an epoch. To effectively recognize patterns in the data, it's essential to feed our entire dataset into the model multiple times.

We can utilize the XOR example as a validation test to ensure our model is set up correctly. The XOR has only four unique combinations, all of which are derived from the truth table discussed in the preceding chapter.

By training our neural network using these examples over 10,000 epochs and a learning rate of 0.2, we ensure the model comprehends the data.

Task
test

Swipe to show code editor

Implement the backpropagation algorithm:

  1. Run forward propagation.
  2. Calculate errors of the neurons.
  3. Calculate delta of the neurons.
  4. Apply learning rate when computing the biases.

Note

There are several missing places in the code for tasks 2-4.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 4
toggle bottom row

book
Backward Propagation

Warning

Backpropagation is the most confusing part of neural network training. At its core, it uses the gradient descent algorithm, which requires a good understanding of calculus.

Backpropagation Structure

We can split backpropagation algorithm into several steps:

Forward Propagation:

At this step we pass our inputs through the perceptron to store outputs (Neuron.output) of every neuron. This part we have already implemented in previous chapter.

Error Computing:

In this phase, we determine the individual error for each neuron. This error indicates the difference between the neuron's output and the desired output.

For neurons in the output layer, this is straightforward: when given a specific input, the error represents the difference between the neural network's prediction and the actual target value.

For neurons in the hidden layers, the error measures the variation between their current output and the expected input for the subsequent layer.

Calculating the Gradient (Delta):

At this stage, we calculate the degree and direction of each neuron's deviation. We achieve this by multiplying the neuron's error with the derivative of its activation function (in this case, sigmoid) based on its output.

This computation should be executed concurrently with error calculation, as the current layer's gradient (delta) is essential for determining the error in the preceding layer. It also causes this process to be done in order from output layer to input layer (backward direction).

Modifying Weights and Biases (Taking a Step in Gradient Descent):

The last step of the backpropagation process involves updating the neurons' weights and biases according to their respective deltas.

Note

Error computing and calculating the gradient should progress in reverse order, moving from the output layer towards the input layer.

Learning Rate

Another crucial aspect of model training is the learning rate. As an integral component of the gradient descent algorithm, the learning rate can be visualized as the pace of training.

A higher learning rate accelerates the training process; however, an excessively high rate might cause the neural network to overlook valuable insights and patterns within the data.

Note

The learning rate is a floating point value between 0 and 1 and its used on the last step of the backpropagation algorithm to reduce the adjustments applied to the weights and biases. Selecting an optimal learning rate involves various methods known as hyperparameter tuning.

Epochs

Every time our perceptron processes the entire dataset, we refer to it as an epoch. To effectively recognize patterns in the data, it's essential to feed our entire dataset into the model multiple times.

We can utilize the XOR example as a validation test to ensure our model is set up correctly. The XOR has only four unique combinations, all of which are derived from the truth table discussed in the preceding chapter.

By training our neural network using these examples over 10,000 epochs and a learning rate of 0.2, we ensure the model comprehends the data.

Task
test

Swipe to show code editor

Implement the backpropagation algorithm:

  1. Run forward propagation.
  2. Calculate errors of the neurons.
  3. Calculate delta of the neurons.
  4. Apply learning rate when computing the biases.

Note

There are several missing places in the code for tasks 2-4.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 4
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt